Gibbs Measures and Phase Transitions on 
Sparse Random Graphs 

O ■ Amir Dembo * , Andrea Montanari * 

o : 

Stanford University. 

I > , e-mai/; amirOmath. Stanford, edu; montanariOstanf ord . edu 

o ■ 

' Abstract: Many problems of interest in computer science and informa- 

' tion theory can be phrased in terms of a probabiUty distribution over 

OO . discrete variables associated to the vertices of a large (but finite) sparse 

graph. In recent years, considerable progress has been achieved by view- 

' ing these distributions as Gibbs measures and applying to their study 

■ heuristic tools from statistical physics. We review this approach and 

' provide some results towards a rigorous treatment of these problems. 
(-H ■ 

(-H I AMS 2000 subject classifications: Primary 60B10, 60G60, 82B20. 

. Keywords and phrases: Random graphs, Ising model, Gibbs mea- 

^ ' sures, Phase transitions, Spin models, Local weak convergence.. 



^ I Contents 

> : 

. 1 Introduction 2 

I 1.1 The Curie- Weiss model and some general definitions 4 

' 1-2 Graphical models: examples 13 

. 1.3 Detour: The Ising model on the integer lattice 18 

[ 2 Ising models on locally tree-like graphs 21 

' 2.1 Locally tree-like graphs and conditionally independent trees . 22 

^ ■ 2.2 Ising models on conditionally independent trees 29 

• w^ ', 2.3 Algorithmic implications: belief propagation 32 

^ ' 2.4 Free entropy density, from trees to graphs 33 

■ 2.5 Coexistence at low temperature 36 

3 The Bethe-Peierls approximation 40 

3.1 Messages, belief propagation and Bethe equations 41 

3.2 The Bethe free entropy 44 

3.3 Examples: Bethe equations and free entropy 47 

3.4 Extremality, Bethe states and Bethe-Peierls approximation . 52 

4 Colorings of random graphs 57 

4.1 The phase diagram: a broad picture 58 



•Research partially funded by NSF grant #DMS-0806211. 

1 

imsart-generic ver. 2009/08/13 file: full-version.tex date: October 28, 2009 



Dembo et al./Gibbs Measures on Sparse Random Graphs 2 

4.2 The COL-UNCOL transition 60 

4.3 Coexistence and clustering: the physicist's approach 62 

5 Reconstruction and extremahty 71 

5.1 Applications and related work 73 

5.2 Reconstruction on graphs: sphericity and tree-solvability ... 76 

5.3 Proof of main results 79 

6 XORSAT and finite-size scaling 85 

6.1 XORSAT on random regular graphs 87 

6.2 Hyper-loops, hypergraphs, cores and a peeling algorithm ... 91 

6.3 The approximation by a smooth Markov kernel 93 

6.4 The ODE method and the critical value 98 

6.5 Diffusion approximation and scaling window 101 

6.6 Finite size scaling correction to the critical value 102 

References 106 



1. Introduction 

Statistical mechanics is a rich source of fascinating phenomena that can be, 
at least in principle, fully understood in terms of probability theory. Over 
the last two decades, probabilists have tackled this challenge with much suc- 
cess. Notable examples include percolation theory [49], interacting particle 
systems [61], and most recently, conformal invariance. Our focus here is on 
another area of statistical mechanics, the theory of Gibbs measures, which 
provides a very effective and flexible way to define collections of 'locally 
dependent' random variables. 

The general abstract theory of Gibbs measures is fully rigorous from a 
mathematical point of view [42]. However, when it comes to understanding 
the properties of specific Gibbs measures, i.e. of specific models, a large gap 
persists between physicists heuristic methods and the scope of mathemati- 
cally rigorous techniques. 

This paper is devoted to somewhat non-standard, family of models, namely 
Gibbs measures on sparse random graphs. Classically, statistical mechanics 
has been motivated by the desire to understand the physical behavior of ma- 
terials, for instance the phase changes of water under temperature change, 
or the permeation or oil in a porous material. This naturally led to three- 
dimensional models for such phenomena. The discovery of 'universality' (i.e. 
the observation that many qualitative features do not depend on the micro- 
scopic details of the system), led in turn to the study of models on three- 
dimensional lattices, whereby the elementary degrees of freedom (spins) are 
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associated with the vertices of of the lattice. Thereafter, d-dimensional lat- 
tices (typically Z'^), became the object of interest upon realizing that signif- 
icant insight can be gained through such a generalization. 

The study of statistical mechanics models 'beyond Z*^' is not directly 
motivated by physics considerations. Nevertheless, physicists have been in- 
terested in models on other graph structures for quite a long time (an early 
example is [36] ) . Appropriate graph structures can simplify considerably the 
treatment of a specific model, and sometimes allow for sharp predictions. 
Hopefully some qualitative features of this prediction survive on Z'^. 

Recently this area has witnessed significant progress and renewed interest 
as a consequence of motivations coming from computer science, probabilis- 
tic combinatorics and statistical inference. In these disciplines, one is often 
interested in understanding the properties of (optimal) solutions of a large 
set of combinatorial constraints. As a typical example, consider a linear sys- 
tem over GF[2], Ax = b mod 2, with A an n x n binary matrix and b a 
binary vector of length n. Assume that A and b are drawn from random 
matrix/vector ensemble. Typical questions are: What is the probability that 
such a linear system admits a solution? Assuming a typical realization does 
not admit a solution, what is the maximum number of equations that can, 
typically, be satisfied? 

While probabilistic combinatorics developed a number of ingenious tech- 
niques to deal with these questions, significant progress has been achieved re- 
cently by employing novel insights from statistical physics (see [65] ) . Specif- 
ically, one first defines a Gibbs measure associated to each instance of the 
problem at hand, then analyzes its properties using statistical physics tech- 
niques, such as the cavity method. While non-rigorous, this approach ap- 
pears to be very systematic and to provide many sharp predictions. 

It is clear at the outset that, for 'natural' distributions of the binary 
matrix A, the above problem does not have any d-dimensional structure. 
Similarly, in many interesting examples, one can associate to the Gibbs 
measure a graph that is sparse and random, but of no finite-dimensional 
structure. Non-rigorous statistical mechanics techniques appear to provide 
detailed predictions about general Gibbs measures of this type. It would be 
highly desirable -and in principle possible- to develop a fully mathemat- 
ical theory of such Gibbs measures. The present paper provides a unified 
presentation of a few results in this direction. 

In the rest of this section, we proceed with a more detailed overview of the 
topic, proposing certain fundamental questions the answer to which plays 
an important role within the non-rigorous statistical mechanics analysis. We 
illustrate these questions on the relatively well-understood Curie- Weiss (toy) 
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model and explore a few additional motivating examples. 

Section 2 focuses on a specific example, namely the ferromagnetic Ising 
model on sequences of locally tree-like graphs. Thanks to its monotonicity 
properties, detailed information can be gained on this model. 

A recurring prediction of statistical mechanics studies is that Bethe- 
Peierls approximation is asymptotically tight in the large graph limit, for 
sequences of locally tree-like graphs. Section 3 provides a mathematical for- 
malization of Bethe-Peierls approximation. We also prove there that, under 
an appropriate correlation decay condition, Bethe-Peierls approximation is 
indeed essentially correct on graphs with large girth. 

In Section 4 we consider a more challenging, and as of now, poorly under- 
stood, example: proper colorings of a sparse random graph. A fascinating 
'clustering' phase transition is predicted to occur as the average degree of the 
graph crosses a certain threshold. Whereas the detailed description and ver- 
ification of this phase transition remains an open problem, its relation with 
the appropriate notion of correlation decay ('extremality'), is the subject of 
Section 5. 

Finally, it is common wisdom in statistical mechanics that phase transi- 
tions should be accompanied by a specific 'finite-size scaling' behavior. More 
precisely, a phase transition corresponds to a sharp change in some property 
of the model when a control parameter crosses a threshold. In a finite sys- 
tem, the dependence on any control parameter is smooth, and the change 
and takes place in a window whose width decreases with the system size. 
Finite-size scaling broadly refers to a description of the system behavior 
within this window. Section 6 presents a model in which finite-size scaling 
can be determined in detail. 

1.1. The Curie- Weiss model and some general definitions 

The Curie- Weiss model is deceivingly simple, but is a good framework to 
start illustrating some important ideas. For a detailed study of this model 
we refer to [37]. 

1.1.1. A story about opinion formation 

At time zero, each of n individuals takes one of two opinions A'j(O) G 

{+1, —1} independently and uniformly at random for i G [n] = {1, . . . 

At each subsequent time t, one individual i, chosen uniformly at random. 
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computes the opinion imbalance 



n 



M^Y.^j^ (1.1) 



and M*-*^ = M — Xi. Then, he/she changes his/her opinion with probabihty 




Despite its simphcity, this model raises several interesting questions. 

(a) . How long does is take for the process X_{t) to become approximately 

stationary? 

(b) . How often do individuals change opinion in the stationary state? 

(c) . Is the typical opinion pattern strongly polarized [herding)'^. 

(d) . If this is the case, how often does the popular opinion change? 

We do not address question (a) here, but we will address some version of 
questions (b)-(d). More precisely, this dynamics (first studied in statistical 
physics under the name of Glauber or Metropolis dynamics) is an aperiodic 
irreducible Markov chain whose unique stationary measure is 



To verify this, simply check that the dynamics given by (1.2) is reversible 
with respect to the measure ^ri,i3 of (1.3). Namely, that /Un,/3(x)P(x — > x') = 
//„^^(x')P(x' x) for any two configurations x, x' (where P(x — > x') denotes 
the one-step transition probability from x to x'). 

We are mostly interested in the large-n (population size), behavior of 
l^n,i3{-) and its dependence on (3 (the interaction strength). In this context, 
we have the following 'static' versions of the preceding questions: 

(b'). What is the distribution of Pflip(x) when x has distribution Hn^pi " )? 
(c'). What is the distribution of the opinion imbalance Ml Is it concen- 
trated near (evenly spread opinions), or far from (herding)? 
(d'). In the herding case: how unlikely are balanced (M ~ 0) configurations? 

1.1.2. Graphical models 

A graph G = {V, E) consists of a set V of vertices and a set E of edges 
(where an edge is an unordered pair of vertices). We always assume G to 




(1.3) 
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be finite with |y| = n and often make the identification V = [n]. With X 
a finite set, called the variable domain, we associate to each vertex i € V 
a variable Xi G X, denoting by ^ G the complete assignment of these 
variables and by xu = {xi : i G C/} its restriction to [/ C 1/. 

Definition 1.1. A bounded specification ip = {ipij : (i, j) G E} for a graph 
G and variable domain X is a family of functionals tpij '. X y. X — > [0, ^max] 
indexed by the edges of G with V'max cl given finite, positive constant ( where 
for consistency '4)ij{x,x') = '4)ji{x',x) for all x,x' £ X and {i,j) G E). The 
specification may include in addition functions ipi : X ^ [0, V'max] indexed 
by vertices of G. 

A bounded specification tp for G is permissive if there exists a positive 
constant k and a 'permitted state' x\ £ X for each i £ V, such that 
mmiyipiix') > K'f/'max and 

min 7pij{x'!!^,x') = min i{;ij{x' , x'^^ > K^Amax = tpmin- 

The graphical model associated with a graph-specification pair (G, ij)) is 
the canonical probability measure 

and the corresponding canonical stochastic process is the collection X = 
{Xi : i G V} of A'-valued random variables having joint distribution ^cipi')- 
One such example is the distribution (1.3), where X = {+1,-1}, G is 
the complete graph over n vertices and ipij{xi,Xj) = exp{PxiXj/n). Here 
ipi{x) = 1. It is sometimes convenient to introduce a 'magnetic field' (see for 
instance Eq. (1.9) below). This corresponds to taking tpi{xi) = exp{Bxi). 

Rather than studying graphical models at this level of generality, we focus 
on a few concepts/tools that have been the subject of recent research efforts. 



Coexistence. Roughly speaking, we say that a model (G, ^) exhibits 
coexistence if the corresponding measure /iG,i/;(") decomposes into a convex 
combination of well-separated lumps. To formalize this notion, we consider 
sequences of measures /x„ on graphs Gn = ([n],£'„), and say that coex- 
istence occurs if, for each n, there exists a partition Qi^n, ■ ■ ■ ,^r,n of the 
configuration space X"^ with r = r(n) > 2, such that 

(a). The measure of elements of the partition is uniformly bounded away 
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from one: 



max fin{^s,n) < 1 - 5. (1.5) 

l<s<r 



(b). The elements of the partition are separated by 'bottlenecks'. That is, 
for some e > 0, 

max — > 0, (1.6) 

l<s<r fj,n[''h,n) 

as n oo, where denotes the e-boundary of C 

d^n = {xeX" : 1 < d{x, n) < ne} , (1.7) 

with respect to the Hamming ^ distance. The normalization by fin{^s,n) 
removes 'false bottlenecks' and is in particular needed since r(n) often 
grows (exponentially) with n. 

Depending on the circumstances, one may further specify a required 
rate of decay in (1.6). 

We often consider families of models indexed by one (or more) continuous 
parameters, such as the inverse temperature /3 in the Curie- Weiss model. A 
phase transition will generically be a sharp threshold in some property of 
the measure /i( • ) as one of these parameters changes. In particular, a phase 
transition can separate values of the parameter for which coexistence occurs 
from those values for which it does not. 

Mean field models. Intuitively, these are models that lack any (finite- 
dimensional) geometrical structure. For instance, models of the form (1.4) 
with ipij independent of (i, j) and G the complete graph or a regular random 
graph are mean field models, whereas models in which G is a finite subset 
of a finite dimensional lattice are not. To be a bit more precise, the Curie- 
Weiss model belongs to a particular class of mean field models in which the 
measure n{x) is exchangeable (that is, invariant under coordinate permuta- 
tions). A wider class of mean field models may be obtained by considering 
random distributions^ fi{-) (for example, when either G or ip are chosen at 
random in (1.4)). In this context, given a realization of /i, consider k i.i.d. 



^The Hamming distance d{x,x') between configurations x and a:' is the number of 
positions in wliicfi the two configurations differ. Given Q C X" , d{x,^l) = mm{d{x, x^) : 
x' e fi}. 

random distribution over Af" is just a random variabie taking vaiues on the {\X\'^ — 
l)-dimensionai probability simplex. 
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configurations 2L^^\ ■ ■ ■ 1 each having distribution fi. These 'rephcas' 
have the unconditional, joint distribution 

=E{/x(x(i))---/i(xW)} . (1.8) 

The random distribution /x is a candidate to be a mean field model when 
for each fixed k the measure fj.^''\ viewed as a distribution over (A''^)", is 
exchangeable (with respect to permutations of the coordinate indices in 
[n]). Unfortunately, while this property suffices in many 'natural' special 
cases, there are models that intuitively are not mean- field and yet have 
it. For instance, given a non-random measure v and a uniformly random 
permutation vr, the random distribution /i(xi, . . . , x„) = z^(x^(i), . . . , 
meets the preceding requirement yet should not be considered a mean field 
model. While a satisfactory mathematical definition of the notion of mean 
field models is lacking, by focusing on selective examples we examine in the 
sequel the rich array of interesting phenomena that such models exhibit. 

Mean field equations. Distinct variables may be correlated in the model 
(1.4) in very subtle ways. Nevertheless, mean field models are often tractable 
because an effective 'reduction' to local marginals^ takes place asymptoti- 
cally for large sizes (i.e. as n — > oo). 

Thanks to this reduction it is often possible to write a closed system of 
equations for the local marginals that hold in the large size limit and de- 
termine the local marginals, up to possibly having finitely many solutions. 
Finding the 'correct' mathematical definition of this notion is an open prob- 
lem, so we shall instead provide specific examples of such equations in a few 
special cases of interest (starting with the Curie- Weiss model). 

1.1.3. Coexistence in the Curie-Weiss model 

The model (1.3) appeared for the first time in the physics literature as a 
model for ferromagnets^. In this context, the variables Xi are called spins and 
their value represents the direction in which a localized magnetic moment 
(think of a tiny compass needle) is pointing. In certain materials the different 
magnetic moments favor pointing in the same direction, and physicists want 
to know whether such interaction may lead to a macroscopic magnetization 
(imbalance), or not. 

'^In particular, single variable marginals, or joint distributions of two variables con- 
nected by an edge. 

*A ferromagnet is a material that acquires a macroscopic spontaneous magnetization 
at low temperature. 
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In studying this and related problems it often helps to slightly general- 
ize the model by introducing a linear term in the exponent (also called a 
'magnetic field'). More precisely, one considers the probability measures 

1/3 " 

^ln,f3,B{x) = y ,o o^ ^Xp {- J] ^^iXj + 5 ^ . (1.9) 
^nK.P,B) ^^(^ij) i=l ^ 

In this context 1//? is referred to as the 'temperature' and we shall always 
assume that /? > and, without loss of generality, also that B > 0. 

The following estimates on the distribution of the magnetization per site 
are the key to our understanding of the large size behavior of the Curie- Weiss 
model (1.9). 

Lemma 1.2. Let H{x) = — a; log a; — (1 — x)log(l — x) denote the binary 
entropy function and for P > 0, G R and m ^ [— 1, -|-1] set 

1^ 2 ^^^1 + ^^ 



if{m) = ipf3,B{m) = Bm + -(5m^ + H (^^— j ■ (1-10) 

Then, for X = ^f=i ^i, o, random configuration {Xi, . . . , Xn) from the 
Curie-Weiss model and each m & Sn = {—1, —1 + 2/n, . . . , 1 — 2/n, 1}, 

I , , _ 1 



n + lZn{P,B) - ' ' - ZniP,B) 

Proof. Noting that for M = nm, 

F{X = m}= ' " ,\e.p\BM+^-'-p], 

^ ^ Zn{(3,B) \{n + M)/2j 2n 2^ r 

our thesis follows by Stirling's approximation of the binomial coefficient (for 
example, see [24, Theorem 12.1.3]). □ 

A major role in determining the asymptotic properties of the measures 
fJ'n,f3,B is played by the free entropy density (the term 'density' refers here 
to the fact that we are dividing by the number of variables) , 

MP, B) = - log Zn{f3,B). (1.12) 
n 

Lemma 1.3. For all n large enough we have the following bounds on the 
free entropy density 4>niP,B) of the (generalized) Curie-Weiss model 

{P,B)-^-- log{n{n + 1)} < MP, B) < MP, B) + - log{n + 1) , 
In n n 
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where 

=sup{(^^,B(m) : mG [-1,1]} . (1.13) 

Proof. The upper bound follows upon summing over m € Sn the upper 
bound in (1.11). Further, from the lower bound in (1.11) we get that 

4>niP, B) > max \ipi3 B{m) : m G Sn] - w~ - - log(n + 1) . 
^ > In n 

A little calculus shows that maximum of ^[3^b{') over the finite set Sn is not 
smaller that its maximum over the interval [— 1,+1] minus n~^(logn), for 
all n large enough. □ 

Consider the optimization problem in Eq. (1.13). Since 97/3, _b(-) is con- 
tinuous on [—1,1] and differentiable in its interior, with ip'^^ ^(m) ±00 
as m =f1i this maximum is achieved at one of the points m £ (—1,1) 
where 'p'p B^m) = 0. A direct calculation shows that the latter condition is 
equivalent to 

m = tanh(/3m + S) . (1.14) 

Analyzing the possible solutions of this equation, one finds out that: 

(a) . For /3 < 1, the equation (1.14) admits a unique solution ?ti*(/3, S) 

increasing in B with m^:{(3,B) | as i? | 0. Obviously, m^:{l3,B) 
maximizes ipf^^BiiTi)- 

(b) . For P > 1 there exists B^[f3) > continuously increasing in (3 with 

li-mpn B^{j3) = such that: {i) for < S < B^{j3), Eq. (1.14) admits 
three distinct solutions m-{(3, i?), mo(/9, B),m+{f3, B) = m^{(3, B) with 
m_ < mo < < m+ = m^,; (ii) for B = B^:{(3) the solutions 
m-{l3,B) = mQ{l3,B) coincide; (in) and for B > B^{I3) only the 
positive solution m^:{l3,B) survives. 

Further, for B > the global maximum of ipp^Bi'i^^) over m S [—1,1] 
is attained at m = m^:{(3,B), while mo{l3,B) and m-{P,B) are (re- 
spectively) a local minimum and a local maximum (and a saddle point 
when they coincide at B = B^{(3)). Since 93/3,o(") is an even function, 
in particular mo(/3, 0) = and m±{(3,0) = ±m^:{(3,0). 

Our next theorem answers question (c') of Section 1.1.1 for the Curie- 
Weiss model. 
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Theorem 1.4. Consider X of Lemma 1.2 and the relevant solution m^{f3, B) 
of equation (1.14)- If either (3 < 1 or B > 0, then for any s > there exists 
C(e) > such that, for all n large enough 

F{\x-m4(3,B)\ <e} > 1 - e'^^^^) . (1.15) 

In contrast, if B = and (3 > 1, then for any e > there exists C{e) > 
such that, for all n large enough 

p{|X-m*(/?,0)| < e} = P{|X+?n,(/3,0)| < e} > ^ - 6""^^^^ (1.16) 

Proof. Suppose first that either /3 < 1 or B > 0, in which case ipp^Bifn) has 
the unique non-degenerate global maximizer = m^{(3,B). Fixing e > 
and setting = [—1, m^, — e] U [m^, + e, 1], by Lemma 1.2 

P{X G le} < ^ (n + 1) exp ^nm&y.[ipfi^B{m) : m e lej^ 

Using Lemma 1.3 we then find that 

P{X G 4} < (n + 1)^6^/2 exp {nmax[v7^,B(m) - Ml3,B) : m e Is]} , 

whence the bound of (1.15) follows. 

The bound of (1.16) is proved analogously, using the fact that ^n./Sfiisi) = 

/^n,/3,o(-^)- n 

We just encountered our first example of coexistence (and of phase tran- 
sition). 

Theorem 1.5. The Curie-Weiss model shows coexistence if and only if 
B = and P > I. 

Proof. We will limit ourselves to the 'if part of this statement: for B = 0, 
P > 1, the Curie- Weiss model shows coexistence. To this end, we simply 
check that the partition of the configuration space {+1, —1}" to 17+ = {x : 
J2i ^ 0} ^-iid 0_ = {x : J2i < 0} satisfies the conditions in Section 
1.1.2. Indeed, it follows immediately from (1.16) that choosing a positive 
e < m*(/3, 0)/2, we have 

for some C > and all n large enough, which is the thesis. □ 
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1.1.4- Curie- Weiss model: Mean field equations 

We have just encountered our first example of coexistence and our first 
example of phase transition. We further claim that the identity (1.14) can 
be 'interpreted' as our first example of a mean field equation (in line with the 
discussion of Section 1.1.2). Indeed, assuming throughout this section not to 
be on the coexistence line S = 0, /3 > 1, it follows from Theorem 1.4 that 
EXi = EX ^m^{f3, B).^ Therefore, the identity (1.14) can be rephrased as 

EXi Ritanh{5 + -^EXj}, (1.17) 

which, in agreement with our general description of mean field equations, 
is a closed form relation between the local marginals under the measure 

/^n,/3,B(-)- 

We next re-derive the equation (1.17) directly out of the concentration 
in probability of X. This approach is very useful, for in more complicated 
models one often has mild bounds on the fluctuations of X while lacking 
fine controls such as in Theorem 1.4. To this end, we start by proving the 
following 'cavity' estimate.^ 

Lemma 1.6. Denote by and Var„^^ the expectation and variance with 
respect to the Curie-Weiss model with n variables at inverse temperature (3 
(and magnetic field B). Then, for [3' = [3{1 + 1/n), X = "127=1 -^i ^'^^ 
any i £ [n], 



K+i,p'Xi - E„,^X,| < psmh{B + /?)^Var„,^(X) . (1.18) 
Proof. By direct computation, for any function F : {+1, —1}" — > R, 

^ rj,fy.. IEn,/3{F(Z)cOsh(i3 + /3X)} 

En,f3{cosh{B + pX)} 
Therefore, with cosh(a) > 1 we get by Cauchy-Schwarz that 

\En+l,t3'{F{2L)} -^nAPi20}\ < |C0V„,^{F(X),C0sh(i? + /3X)}| 



< ||F||ooVVarn,/3(cosh(S + /3X)) < ||F||oo/3sinh(S + /3)A/Var„,^(X) 



^We use ~ to indicate that we do not provide tlie approximation error, nor plan to 
rigorously prove that it is small. 

® Cavity methods of statistical physics aim at understanding thermodynamic limits 
71 —> CO by first relating certain quantities for systems of size n 2> 1 to those in systems 
of size n' = n + 0{1). 
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where the last inequahty is due to the Lipschitz behavior ofx ^ cosh{B+Px) 
together with the bound |X| < 1. □ 

The following theorem provides a rigorous version of Eq. (1.17) for /3 < 1 
OT B >0. 

Theorem 1.7. There exists a constant C{(3,B) such that for any i G [n], 
EXj -tanh{B + - ^EXj} < CiP, B)^Yar(X) . (1.19) 

Proof. In the notations of Lemma 1.6 recall that E„+i.^'Xj is independent 
of i and so upon fixing {Xi, . . . , Xn) we get by direct computation that 

^n+i,/3'{Xi} = E„+i = u^pt ^-y! • 

En,/3 cosh(ij + fjX) 

Further notice that (by the Lipschitz property of cosh(i? + /3x) and sinh(i? + 
f3x) together with the bound |X| < 1), 



\En,f3 sinh(B + pX) - smh{B + /3E„,^X)| < /3cosh(S + /?) JVar„,^(X) , 



|E„,,^ cosh(5 + PX) - cosh{B + /3E„,^X)| < /3sinh(5 + /?)y^Var„,^(X) . 

Using the inequality 101/61—02/62! < |ai — a2|/6i + a2|6i — 62I/6162 we thus 
have here (with Oj > and 6j > max(l,aj)), that 

8 " I — 

En+i./3'{Xi} - tanh {B + '-J2 ^n,p ^j} < C{P, B)^YaVn,f3{X) . 

^ j=i 

At this point you get our thesis by applying Lemma 1.6. □ 



1.2. Graphical models: examples 

We next list a few examples of graphical models, originating at different 
domains of science and engineering. Several other examples that fit the same 
framework are discussed in detail in [65]. 

1.2.1. Statistical physics 
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Ferromagnetic Ising model. The ferromagnetic Ising model is arguably 
the most studied model in statistical physics. It is defined by the Boltzmann 
distribution 



over X = {xi : i E V}, with Xi £ {+1, —1}, parametrized by the 'magnetic 
field' -B € H and 'inverse temperature' /? > 0, where the partition function 
Z{(3, B) is fixed by the normalization condition ^J'{x) = 1. The interaction 
between vertices i,j connected by an edge pushes the variables xi and Xj 
towards taking the same value. It is expected that this leads to a global 
alignment of the variables (spins) at low temperature, for a large family 
of graphs. This transition should be analogue to the one we found for the 
Curie- Weiss model, but remarkably little is known about Ising models on 
general graphs. In Section 2 we consider the case of random sparse graphs. 

Anti-ferromagnetic Ising model. This model takes the same form (1.20), 
but with j3 < 0.^ Note that if i? = and the graph is bipartite (i.e. if there 
exists a partition V = V1UV2 such that E C Vi x V2), then this model is 
equivalent to the ferromagnetic one (upon inverting the signs of {xj, i £ Vi}). 
However, on non-bipartite graphs the anti-ferromagnetic model is way more 
complicated than the ferromagnetic one, and even determining the most 
likely (lowest energy) configuration is a difficult matter. Indeed, for B = 
the latter is equivalent to the celebrated max-cut problem from theoretical 
computer science. 

Spin glasses. An instance of the Ising spin glass is defined by a graph G, 
together with edge weights Jij G R, for (i, j) G E. Again variables are binary 
Xi G {+1, —1} and 



In a spin glass model the 'coupling constants' Jij are random with even dis- 
tribution (the canonical examples being Jij G {+1,-1} uniformly and Jij 
centered Gaussian variables). One is interested in determining the asymp- 
totic properties as n = |V| — > 00 of Hn,i3,B,A') ^ typical realization of 
the coupling J = {Jij}- 

^In the literature one usually introduces explicitly a minus sign to keep f3 positive. 
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X, 



a 




I----0 



X 



3 



C 



Fig 1 . Factor graph representation of the satisfiability formula (xi V 12 V X4) A (xi V 3:2) A 
{x2 V 2:4 V 2:5) A {xi V ^2 V xs) A {xi V 2:2 V 2:5). Edges are continuous or dashed depending 
whether the corresponding variable is directed or negated in the clause. 

1.2.2. Random constraint satisfaction problems 

A constraint satisfaction problem (CSP) consists of a finite set X (called 
the variable domain), and a class C of possible constraints (i.e. indicator 
functions), each of which involves finitely many ^Y-valued variables Xi. An 
instance of this problem is then specified by a positive integer n (the num- 
ber of variables), and a set of m constraints involving only the variables 
xi, . . . , x„ (or a subset thereof). A solution of this instance is an assignment 
in X"^ for the variables xi, . . . , rE„ which satisfies all m constraints. 

In this context, several questions are of interest within computer science: 

1. Decision problem. Does the given instance have a solution? 

2. Optimization problem. Maximize the number of satisfied constraints. 

3. Counting problem. Count the number of solutions. 

There are many ways of associating a graphical model to an instance of 
CSP. If the instance admits a solution, then one option is to consider the 
uniform measure over all such solutions. Let us see how this works in a few 
examples. 

Coloring. A proper g-coloring of a graph G is an assignment of colors in 
[q\ to the vertices of G such that no edge has both endpoints of the same 
color. The corresponding CSP has variable domain X = [q\ and the possible 
constraints in C are indexed by pairs of indices {i,j) € V x where the 
constraint (i, j) is satisfied if and only if Xi 7^ Xj. 

Assuming that a graph G admits a proper g-coloring, the uniform measure 
over the set of possible solutions is 




(1.22) 
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with Zg counting the number of proper q-colorings of G. 

k-SAT. In case of /c-satisfiability (in short, fe-SAT), the variables are 
binary Xi € X = {0, 1} and each constraint is of the form (^^^(i), . . . , ^ 
(a;*^-^^, . . . ,x*^j^-^) for some prescribed fc-tuple (i(l), . . . ,i{k)) of indices inV = 
[n] and their prescribed values {x*^-^^^, . . . In this context constraints 

are often referred to as 'clauses' and can be written as the disjunction (logical 
OR) of k variables or their negations. The uniform measure over solutions 
of an instance of this problem, if such solutions exist, is then 

/i(x) = ^ n l((^ia{l).---'^ia(fc)) ^ «(!)'• --'^IW)) ' 

a=l 

with Z counting the number of solutions. An instance can be associated to a 
factor graph, cf. Fig. 1. This is a bipartite graph having two types of nodes: 
variable nodes in V = [n] denoting the unknowns xi, . . . ,Xn and function 
(or factor) nodes in F = [m] denoting the specified constraints. Variable 
node i and function node a are connected by an edge in the factor graph if 
and only if variable Xi appears in the a-th clause, so da = {ia{l), ■ ■ ■ , ia{k)} 
and di corresponds to the set of clauses in which i appears. 

In general, such a construction associates to arbitrary CSP instance a 
factor graph G = {V,F,E). The uniform measure over solutions of such an 
instance is then of the form 

MG,v(^) = n ^-^fea) . (1-23) 

for a suitable choice of ^ = {V'a(') : a G F}. Such measures can also be 
viewed as the zero temperature limit of certain Boltzmann distributions. 
We note in passing that the probability measure of Eq. (1.4) corresponds to 
the special case where all function nodes are of degree two. 

1.2.3. Communications, estimation, detection 

We describe next a canonical way of phrasing problems from mathematical 
engineering in terms of graphical models. Though we do not detail it here, 
this approach applies to many specific cases of interest. 

Let Xi, . . . ,Xn be a collection of i.i.d. 'hidden' random variables with a 
common distribution pq{ ■ ) over a finite alphabet X. We want to estimate 
these variables from a given collection of observations Yi, . . . , ■ The a-th 
observation (for a G [m]) is a random function of the Xj's for which i £ da = 
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{ia{l), ■ ■ ■ ,ia{k)}. By this we mean that Ya is conditionally independent of 
all the other variables given {Xi : i G da} and we write 

IP {Ya G A\2Lda = 2Lda} = Qa{A\x9,) . (1.24) 

for some probability kernel Qa (•!•)• 

The a posteriori distribution of the hidden variables given the observa- 
tions is thus 

m n 
a=l i=l 

1.2.4- Graph and graph ensembles 

The structure of the underlying graph G is of much relevance for the general 
measures of (1.4). The same applies in the specific examples we have 
outlined in Section 1.2. 

As already hinted, we focus here on (random) graphs that lack finite 
dimensional Euclidean structure. A few well known ensembles of such graphs 
(c.f. [54]) are: 

I. Random graphs with a given degree distribution. Given a probability 
distribution {Pi}i>o over the non-negative integers, for each value of 
n one draws the graph G„ uniformly at random from the collection of 
all graphs with n vertices of which precisely [nP^J are of degree k > 1 
(moving one vertex from degree A; to A; + 1 if needed for an even sum 
of degrees). We will denote this ensemble by G(P, n). 
II. The ensemble of random k-regular graphs corresponds to = 1 (with 
kn even) . Equivalently, this is defined by the set of all graphs G„ over 
n vertices with degree k, endowed with the uniform measure. With a 
slight abuse of notation, we will denote it by G{k,n). 
III. Erdds-Renyi graphs. This is the ensemble of all graphs Gn with n 
vertices and m = [na\ edges endowed with the uniform measure. A 
slightly modified ensemble is the one in which each edge {i,j) is present 
independently with probability 720/(2). ^^^^ denote it as G{a,n). 

As further shown in Section 2.1, an important property of these graph en- 
sembles is that they converge locally to trees. Namely, for any integer i, the 
depth-£ neighborhood Bj(^) of a uniformly chosen random vertex i converges 
in distribution as n ^ 00 to a certain random tree of depth (at most) i. 
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1.3. Detour: The Ising model on the integer lattice 

In statistical physics it is most natural to consider models with local inter- 
actions on a finite dimensional integer lattice U^, where d = 2 and d = 3 
are often the physically relevant ones. While such models are of course non- 
mean field type, taking a short detour we next present a classical result 
about ferromagnetic Ising models on finite subsets of Z^. 

Theorem 1.8. Let E„^^ denote expectations with respect to the ferromag- 
netic Ising measure (1.20) at zero magnetic field, in case G = iV^E) is 
a square grid of side y/n. Then, for large n the average magnetization 
X = n^^Y17=i^i concentrates around zero for high temperature hut not 
for low temperature. More precisely, for some f3o > 0, 

lim mfEn,f3{\X\} = 1, (1.26) 

/3— >oo n ' 

lim E„/3{|XP} = V/3</3o. (1.27) 

n — ^oo 

While this theorem and its proof refer to Z^, the techniques we use are 
more general. 

Low temperature: Peierls argument. The proof of (1.26) is taken from 
[47] and based on the Peierls contour representation for the two dimensional 
Ising model. We start off by reviewing this representation. First, given a 
square grid G = {V, E) of side y/n in Z^, for each (i, j) G E draw a perpen- 
dicular edge of length one, centered at the midpoint of (i, j). Let E* denote 
the collection of all these perpendicular edges and V* the collection of their 
end points, viewed as a finite subset of R^. A contour is a simple path on 
the 'dual' graph G* = iy* , E*), either closed or with both ends at boundary 
(i.e. degree one) vertices. A closed contour C divides V to two subsets, the 
inside of C and the outside of G. We further call as 'inside' the smaller 
of the two subsets into which a non-closed contour divides V (an arbitrary 
convention can be used in case the latter two sets are of equal size) . A Peierls 
contours configuration (C,s) consists of a sign s £ {+1,-1} and an edge- 
disjoint finite collection C of non-crossing contours (that is, whenever two 
contours share a vertex, each of them bends there). Starting at an Ising con- 
figuration X G = {+1, —1}^ note that the set Vl|_(x) = {v £ V : Xy = +1} 
is separated from V-{x) = {v £ V : x^, = —1} hy an edge-disjoint finite 
collection C = C{x) of non-crossing contours. Further, it is not hard to 
check that the non-empty set U{x) = {v £ V : v not inside any contour 
from C} is either contained in V+(x), in which case s{x) = -|-1 or in V-{x), 
in which case s{x) = —1, partitioning 0, to 17+ = {x : s{x) = +1} and 
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fl- = {x : s{x) = —1}. In the reverse direction, the Ising configuration is 
read off a Peierls contours configuration by setting = s when the 

number of contours C £ C such that v £ V lies in the inside of C is even 
while Xy = —s when it is odd. The mapping x i— > —x exchanges with r2_ 
so 

E,,^[|X|] > 2E^,p[XI{X G = 1 - lE„,^[|y_(X)|I(X G . (1.28) 

If X is in r2-|_ then is bounded by the total number of vertices of 

V inside contours of C, which by isoperimetric considerations is at most 
J2ceC I^P (where \C\ denotes the length of contour C). Further, our one-to- 
one correspondence between Ising and Peierls contours configurations maps 
the Ising measure at /3 > to uniform s S {+1, —1} independent of C whose 
distribution is the Peierls measure 



1 



= ^ n 



e 



-2(3\C\ 



cec 



Recall that if a given contour C is in some edge-disjoint finite collection C of 
non-crossing contours, then C = C\C is another such collection, with C C 
injective, from which we easily deduce that fj,^{C £ C) < exp(— 2/3|C|) for 
any fixed contour C. Consequently, 

^nA\y-i2LM2L G < ^ \C\^fi4C G C) 

c 

<Y.fNc{n,£)e-^^\ (1.29) 

i>2 

where Nc{n,i) denotes the number of contours of length i for the square 
grid of side y/n. Each such contour is a length £ path of a non-reversing 
nearest neighbor walk in starting at some point in V*. Hence, Nc{n,i) < 
\V*\3^ < n3^+^ Combining this bound with (1.28) and (1.29) we conclude 
that for all n, 

^nAm] > 1 - -E^'^^c(n,£)e-2^^ > 1 - 12j2f3'e-'^' . 

We are thus done, as this lower bound converges to one for /? — > oo. 
High-temperature expansion. The proof of (1.27), taken from [39], is 
by the method of high-temperature expansion which serves us again when 
dealing with the unfrustrated XORSAT model in Section 6.1. As in the 
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low-temperature case, the first step consists of finding an appropriate 'ge- 
ometrical' representation. To this end, given a subset ?7 C 1/ of vertices, 
let 

and denote by G{U) the set of subgraphs of G having an odd-degree at each 
vertex in U and an even degree at all other vertices. Then, with 9 = tanh(/3) 
and F Q E denoting both a subgraph of G and its set of edges, we claim 
that 

= 2l^l(cosh/3)l^l (1-30) 
Feg(u) 

Indeed, e^^ = cosh(/3)[l + y6] for y G {+1, —1}, so by definition 
= (cosh /5) 1^1^ xc; n i^ + ^i^jG] 

31 ii,j)GE 

= (cosh/3)l^l ^ el^l^x^ n ^i^j- 

FCE X {i,j)&F 

By symmetry zero unless each v ^ V appears in the set R an 

even number of times, in which case the sum is 2l^L In particular, the latter 
applies for xr = 11(1 j)eF ^i^j if only if F G G{U) from which our 
stated high-temperature expansion (1.30) follows. 

We next use this expansion to get a uniform in n decay of correlations at 
all P < Po = atanh(l/3), with an exponential rate with respect to the graph 
distance d{i,j). More precisely, we claim that for any such (3, n and i,j £ V 

En,f3{W,} < (1 - 39)~\3e f''^^ . (1.31) 

Indeed, from (1.30) we know that 

Z{i,j)i/^) _ ^ Feg{{i,j} ) 



Let J-{i,j) denote the collection of all simple paths from i to j in 1? and 
for each such path Fjj, denote by G((}>,Fij) the sub-collection of graphs in 
t/(0) that have no edge in common with Fij. The sum of vertex degrees 
in a connected component of a graph F is even, hence any F G ^({i^i}) 
contains some path Fij G !F(i,j). Further, F is the edge-disjoint union of 
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Fij and F' = F \ Fij with F' having an even degree at each vertex. As 
F' G g{i/},Fij) we thus deduce that 

F.,.e^(«,j) ^F'eG(0)^' F..,G^(ij) 

The number of paths in J^{i,j) of length i is at most 3^ and their mini- 
mal length is d{i,j). Plugging this in the preceding bound establishes our 
correlation decay bound (1.31). 

We are done now, for there are at most 8d vertices in at distance d 
from each i ^1?. Hence, 

1 1 ^ 

which for < 1/3 decays to zero as n — > cxd. 



2. Ising models on locally tree-like graphs 

A ferromagnetic Ising model on the finite graph G (with vertex set V, and 
edge set E) is defined by the Boltzmann distribution HiS^Bisi) of (1-20) with 
/3 > 0. In the following it is understood that, unless specified otherwise, the 
model is ferromagnetic, and we will call it 'Ising model on C 

For sequences of graphs G„ = {Vn,En) of diverging size n, non-rigorous 
statistical mechanics techniques, such as the 'replica' and 'cavity methods,' 
make a number of predictions on this model when the graph G 'lacks any 
finite-dimensional structure.' The most basic quantity in this context is the 
asymptotic free entropy density, cf. Eq. (1.12), 

cj){P,B) = hm -logZnil3,B). (2.1) 

The Curie- Weiss model, cf. Section 1.1, corresponds to the complete graph 
Gn = Kn- Predictions exist for a much wider class of models and graphs, 
most notably, sparse random graphs with bounded average degree that arise 
in a number of problems from combinatorics and theoretical computer sci- 
ence (cf. the examples of Section 1.2.2). An important new feature of sparse 
graphs is that one can introduce a notion of distance between vertices as the 
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length of shortest path connecting them. Consequently, phase transitions 
and coexistence can be studied with respect to the correlation decay proper- 
ties of the underlying measure. It turns out that this approach is particularly 
fruitful and allows to characterize these phenomena in terms of appropriate 
features of Gibbs measures on infinite trees. This direction is pursued in [58] 
in the case of random constraint satisfaction problems. 

Statistical mechanics also provides methods for approximating the local 
marginals of the Boltzmann measure of (1.20). Of particular interest is the 
algorithm known in artificial intelligence and computer science under the 
name of belief propagation. Loosely speaking, this procedure consists of solv- 
ing by iteration certain mean field (cavity) equations. Belief propagation is 
shown in [29] to converge exponentially fast for an Ising model on any graph 
(even in a low-temperature regime lacking uniform decorrelation), with re- 
sulting asymptotically tight estimates for large locally tree-like graphs (see 
Section 2.3). 

2.1. Locally tree-like graphs and conditionally independent trees 

We follow here [29], where the asymptotic free entropy density (2.1) is de- 
termined rigorously for certain sparse graph sequences that converge 
locally to trees. In order to make this notion more precise, we denote by 
Bj(t) the subgraph induced by vertices of G„, whose distance from i is at 
most t. Further, given two rooted trees Ti and T2 of the same size, we write 
Ti ~ T2 if Ti and T2 are identical upon labeling their vertices in a breadth 
first fashion following lexicographic order among siblings. 

Definition 2.1. Let P„ denote the law of the ball Bj(t) when i £ Vn is a 
uniformly chosen random vertex. We say that {Gn} converges locally to the 
random rooted tree T if, for any finite t and any rooted tree T of depth at 
most t, 



where \di\ denotes the size of the set di of neighbors ofi E Vn (i.e. the degree 




(2.2) 



where T(t) denotes the subtree of first t generations ofT. 
We also say that {Gn} is uniformly sparse if 



lim limsup —— V] \di\ I{\di\ > I) = , 



(2.3) 



ofi). 
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The proof that for locally tree-like graphs 4>n{l3, B) = - log Zn{P, B) con- 
verges to (an explicit) limit (j){(3^ B) consists of two steps 

(a) . Reduce the computation of (pn{P,B) to computing expectations of 

local (in G„) quantities with respect to the Boltzmann measure (1.20). 
This is achieved by noting that the derivative of (pniP, B) with respect 
to /3 is a sum of such expectations. 

(b) . Show that under the Boltzmann measure (1.20) on G„ expectations of 

local quantities are, for t and n large, well approximated by the same 
expectations with respect to an Ising model on the associated random 
tree T(t) (a philosophy related to that of [9]). 

The key is of course step (b), and the challenge is to carry it out when 
the parameter (3 is large and we no longer have uniqueness of the Gibbs 
measure on the limiting tree T. Indeed, this is done in [29] for the following 
collection of trees of conditionally independent (and of bounded average) 
offspring numbers. 

Definition 2.2. An infinite labeled tree T rooted at the vertex is called 
conditionally independent if for each integer k > 0, conditional on the sub- 
tree T{k) of the first k generations of T, the number of offspring Aj for 
j G dT{k) are independent of each other, where dT{k) denotes the set of 
vertices at generation k. We further assume that the (conditional on T{k)) 
first moments of Aj are uniformly bounded by a given non-random finite 
constant A and say that an unlabeled rooted tree T is conditionally inde- 
pendent i/ T ~ T' for some conditionally independent labeled rooted tree 

r. 

As shown in [29, Section 4] (see also Theorem 2.10), on such a tree, lo- 
cal expectations are insensitive to boundary conditions that stochastically 
dominate the free boundary condition. Our program then follows by mono- 
tonicity arguments. An example of the monotonicity properties enjoyed by 
the Ising model is provided by Lemma 2.12. 

We next provide a few examples of well known random graph ensembles 
that are uniformly sparse and converge locally to conditionally indepen- 
dent trees. To this end, let P = {P^ : A; > 0} be a probability distribu- 
tion over the non-negative integers, with finite, positive first moment P, set 
Pk = {k + l)Pk+i/P and denote its mean as p. We denote by T{p,t) the 
rooted Galton- Watson tree oi t > generations, i.e. the random tree such 
that each node has offspring distribution {pk}, and the offspring numbers 
at different nodes are independent. Further, T(P, p, t) denotes the modified 
ensemble where only the offspring distribution at the root is changed to 
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P. In particular, T(P, p, oo) is clearly conditionally independent. Other ex- 
amples of conditionally independent trees include: (a) deterministic trees 
with bounded degree; (6) percolation clusters on such trees; (c) multi-type 
branching processes. 

When working with random graph ensembles, it is often convenient to 
work with the configuration models [17] defined as follows. In the case of 
the Erdos-Renyi random graph, one draws m i.i.d. edges by choosing their 
endpoints ia,ja independently and uniformly at random for a = 1, . . . ,m. 
For a graph with given degree distribution {Pk}, one first partitions the 
vertex sets into subsets Vq, of [nPo\ vertices, Vi of [nPiJ vertices, V2 of [nP2\ 
vertices, etc. Then associate k half-edges to the vertices in for each k 
(eventually adding one half edge to the last node, to make their total number 
even) . Finally, recursively match two uniformly random half edges until there 
is no unmatched one. Whenever we need to make the distinction we denote 
by P=k( • ) probabilities under the corresponding configuration model. 

The following simple observation transfers results from configuration mod- 
els to the associated uniform models. 

Lemma 2.3. Let An be a sequence of events, such that, under the configu- 
ration model 

5]P,(G„Y^n)<oo. (2.4) 

n 

Further, assume m = [an\ with a fixed (for Erdos-Renyi random graphs), 
or {Pk} fixed, with bounded first moment (for general degree distribution). 
Then, almost surely under the uniform model, property holds for all n 
large enough. 

Proof. The point is that, the graph chosen under the configuration model is 
distributed uniformly when further conditional on the property L„ that it 
has neither self-loops nor double edges (see [54]). Consequently, 

P(G„ An) = ¥,{Gn An\Ln) < P*(G„ A„)/P*(L„) . 

The thesis follows by recalling that P*(-L„) is bounded away from uniformly 
in n for the models described here (c.f. [54]), and applying the Borel-Cantelli 
lemma. □ 

Our next lemma ensures that we only need to check the local (weak) 
convergence in expectation with respect to the configuration model. 
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Lemma 2.4. Given a finite rooted tree T of at most t generations, assume 
that 

lim P4Bi(t) ~T} = Qt, (2.5) 

n — *oo 

for a uniformly random vertex i £ Gn- Then, under both the configuration 
and the uniform models of Lemma 2.3, P„{Bj(t) ~ T} ^ almost surely. 

Proof. Per given value of n consider the random variable Z = P,j{Bj(t) ~ 
T}. In view of Lemma 2.3 and the assumption (2.5) that E^,[Z] = P,,{Bj(t) ~ 
T} converges to Qr, it suffices to show that P*{|Z— E^, [Z]| > 5} is summable 
(in n), for any fixed 5 > 0. To this end, let r denote the maximal degree 
of T. The presence of an edge (j, k) in the resulting multi-graph G„ affects 
the event {Bj(t) ~ T} only if there exists a path of length at most t in G„ 
between i and {j, k}, the maximal degree along which is at most r. Per given 
choice of (j, k) there are at most u = u{r, t) = 2 J2l=o such values of i € [n], 
hence the Lipschitz norm of Z as a function of the location of the m edges 
of Gn is bounded by 2u/n. Let Gn{t) denote the graph formed by the first t 
edges (so Gn{m) = Gn), and introduce the martingale Z{t) = E^:[Z\Gn{t)], 
so Z{m) = Z and Z{0) = E,^[Z]. A standard argument (c.f. [10, 81]), shows 
that the conditional laws P=^(- |G„(t)) and P*(- |G„(t + 1)) of G n can be 
coupled in such a way that the resulting two (conditional) realizations of 
Gn differ by at most two edges. Consequently, applying Azuma-Hoeffding 
inequality we deduce that for any T, M and 6 > 0, some cq = cq{5, M,u) 
positive and all m < nM, 

P*(|Z - E4Z] \>S)= P*(|Z„, -Zo\>6)< 2e-'^»" , (2.6) 

which is more than enough for completing the proof. □ 

Proposition 2.5. Given a distribution {Pi}i>o of finite mean, let {Gn}n>i 
be a sequence of graphs whereby Gn is distributed according to the ensemble 
G{P,n) with degree distribution P. Then the sequence {Gn} is almost surely 
uniformly sparse and converges locally to T{P, p,oo). 

Proof. Note that for any random graph G.„ of degree distribution P, 

En{l) = H\di\ >l)<l + nJ2kPk = l + nPi. (2.7) 

ieV„ k>l 

Our assumption that P = J2k is finite implies that — > as / — > oo, 
so any such sequence of graphs {Gn} is uniformly sparse. 
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As the collection of finite rooted trees of finite depth is countable, by 
Lemma 2.4 we have the almost sure local convergence of to T(P, p, oo) 
once we show that P=K(Bj(t) ~ T) ^ ¥{T{P, p,t) ~ T) as n — > oo, where 
i S Gn is a uniformly random vertex and T is any fixed finite, rooted tree 
of at most t generations. 

To this end, we opt to describe the distribution of Bj(i) under the config- 
uration model as follows. First fix a non-random partition of [n] to subsets 
Vfc with |Vfc| = [nPfcJ, and assign k half-edges to each vertex in V^. Then, 
draw a uniformly random vertex i G [n]. Assume it is in T4, i.e. has k 
half-edges. Declare these half-edges 'active'. Recursively sample k unpaired 
(possibly active) half-edges, and pair the active half-edges to them. Repeat 
this procedure for the vertices thus connected to i and proceed in a breadth 
first fashion for t generations (i.e. until all edges of Bj(t) are determined). 
Consider now the modified procedure in which, each time an half-edge is 
selected, the corresponding vertex is put in a separate list, and replaced by 
a new one with the same number of half-edges, in the graph. Half-edges in 
the separate list are active, but they are not among the candidates in the 
sampling part. This modification yields Bj(t) which is a random tree, specifi- 
cally, an instance of T(p("), t), where P^"^ = [n Pk\ / Ei [n Pi\ ■ Clearly, 
T(p("),p("),t) converges in distribution as n — > oo to T(P, p, t). The proof is 
thus complete by providing a coupling in which the probability that either 
Bi{t) ~ T under the modified procedure and Bj(t) 9^ T under the original 
procedure (i.e. the configurational model), or vice versa, is at most 4|Tp/n. 
Indeed, after i steps, a new vertex j is sampled by the pairing with prob- 
ability pj oc kj(i) in the original procedure and p'j oc kj{0) in the modified 
one, where kj{£) is the number of free half-edges associated to vertex j at 
step i. Having to consider at most |T| steps and stopping once the original 
and modified samples differ, we get the stated coupling upon noting that 
Hp ~p'||tv ^ 2|T|/n (as both samples must then be subsets of the given 
tree T). □ 

Proposition 2.6. Let {G„}„>i be a sequence of Erdds-Renyi random graphs, 
i.e. of graphs drawn either from the ensemble G{a,n) or from the uniform 
model with m = m{n) edges, where m[n)/n — > a. Then, the sequence {Gn} is 
almost surely uniformly sparse and converges locally to the Galton-Watson 
tree T(P, p, 00) with Poisson{2a) offspring distribution P (in which case 
Pk = Pk)- 

Proof. We denote by P^™^(-) and E^"''^(-) the probabilities and expectations 
with respect to a random graph Gn chosen uniformly from the ensemble of 
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all graphs of m edges, with Pi (•) and E* (•) in use for the corresponding 
configuration model. 

We start by proving the almost sure uniform sparsity for graphs G„ from 
the uniform ensemble of m = m(n) edges provided m(n) /n < M for all n and 
some finite M. To this end, by Lemma 2.3 it suffices to prove this property for 
the corresponding configuration model. Setting Z = n~^En{l) for En{l) of 
(2.7) and P^™^ to be the Binomial(2m, 1/n) distribution of the degree of each 
vertex of G„ in this configuration model, note that Ei™'^[Z] = p^"^^ < 
for Pi = J2k>i of the Poisson(4M) degree distribution P, any n > 2 
and m < nM. Since J2k is finite, necessarily P/ — > as / — > oo and the 
claimed almost sure uniform sparsity follows from the summability in n, per 
fixed / and 5 > of pi™^{Z - Ei™^[Z] > 5}, uniformly in m < nM. Recah 
that the presence of an edge (j, k) in the resulting multi-graph G„ changes 
the value of En{l) by at most 21, hence the Lipschitz norm of .Z^ as a function 
of the location of the m edges of Gn is bounded by 21 /n. Thus, applying 
the Azuma-Hoeffding inequality along the lines of the proof of Lemma 2.4 
we get here a uniform in m < nM and summable in n bound of the form of 
(2.6). 

As argued in proving Proposition 2.5, by Lemma 2.4 we further have the 
claimed almost sure local convergence of graphs from the uniform ensembles 
of TTi = m{n) edges, once we verify that (2.5) holds for ¥^\-) and Qt = 
P{T(P, p, t) ~ T} with the Poisson(2a) offspring distribution P. To this end, 
fix a finite rooted tree T of depth at most t and order its vertices from 1 
(for 0) to \T\ in a breadth first fashion following lexicographic order among 
siblings. Let A^, denote the number of offspring of v £ T with T{t — 1) the 
sub-tree of vertices within distance t — 1 from the root of T (so A„ = for 
V ^ T{t — 1)), and denoting by 6 = J2v<T{t-i) = \T\ — I the number of 
edges of T. Under our equivalence relation between trees there are 

b 

distinct embeddings of T in [n] for which the root of T is mapped to 1. 
Fixing such an embedding, the event {Bi(i) ~ T} specifies the b edges in 
the restriction of En to the vertices of T and further forbids having any edge 
in En between T{t — 1) and a vertex outside T. Thus, under the configuration 
model F^\-) with m edges chosen with replacement uniformly among the 
712 = (2) possible edges, the event {Bi(t) ~ T} occurs per such an embedding 
for precisely (77-2 — a — b)"^~^ml/{m — by. of the 71™ possible edge selections, 

imsart-generic ver. 2009/08/13 file: full-version.tex date: October 28, 2009 



Dembo et al./Gibbs Measures on Sparse Random Graphs 28 

where a = (n - |r|)|r(t - 1)| + (^). With pi™^(Bi(t) ~ T) independent of 
i G [n], it follows that 



n^{m — b)l V n2 ^ {n — l)Ay\ 

Since b is independent of n and a = n\T{t — 1)| + 0(1), it is easy to verify 
that for n ^ oo and m/n — > a the latter expression converges to 

b . |r(t-i)| 

Qr ^ (2a)''e-2"l^(*-i)l n X7 = H ^'a^ = P{T(P, p, t) T} 

v=l^^- v=l 

(where Pk = {2a)^e~^'^ /k\, hence pk = Pk for all A;). Further, fixing 7 < 1 
and denoting by the interval of width 2rf around an, it is not hard to 
check that Pi™'^(Bj(t) ~ T) — > uniformly over m G In- 

Let p(")(-) and denote the corresponding laws and expectations 

with respect to random graphs Gn from the ensembles G(a,n), i.e. where 
each edge is chosen independently with probability qn = 2a/{n — 1). The 
preceding almost sure local convergence and uniform sparseness extend to 
these graphs since each law p("')(-) is a mixture of the laws {P^"^^(-),m = 
1,2, . . .} with mixture coefficients P("')(|£'„| = m) that are concentrated on 
m G In- Indeed, by the same argument as in the proof of Lemma 2.3, for 
any sequence of events An, 

P(")(G„ i An) < P(")(|i?n| ^ In) + SUp Pi™^ (G„ ^ An) , (2.8) 

where 

?7 = liminf inf Pi'"^(L„), 

is strictly positive (c.f. [54]). Under p(")(-) the random variable \En\ has the 
Binomial(n(n — \)/2,qn) distribution (of mean an). Hence, upon applying 
Markov's inequality, we find that for some finite ci = ci(a) and all n, 

P^"^(|i^n| i In) < n-4^E(")[(|i?„| - an)'] < c,n^-'"' , 

so taking 7 > 3/4 guarantees the summability (in n), of P*'"-*(|£^.„| ^ /„). For 
given (5 > we already proved the summability in n of sup^g^^ Pi™"^ (G„ ^ 
An) both for An = {n'^Enil) < Pi + 6} and for An = {\Fn{Bi{t) ~ 
r) — Qt\ < 25}. In view of this, considering (2.8) for the former choice of 
An yields the almost sure uniform sparsity of Erdos-Renyi random graphs 
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from G{a,n), while the latter choice of A„ yields the almost sure local con- 
vergence of these random graphs to the Galton- Watson tree T(P, p, oo) with 
Poisson(2a) offspring distribution. □ 

Remark 2.7. As a special case of Proposition 2.5, almost every sequence 
of uniformly random k-regular graphs of n vertices converges locally to the 
(non-random) rooted k-regular infinite tree Tfc(oo). 

Let Tfc(£) denote the tree induced by the first i generations of Tfc(oo), i.e. 
^A:(0) = {0} and for ^ > 1 the tree Tk{£) has k offspring at and {k — 1) 
offspring for each vertex at generations 1 to ^ — 1. It is easy to check that for 
any k > 3, the sequence of finite trees {Tk{i)}e>o does not converge locally 
to Tfc(oo). Instead, it converges to the following random fc-canopy tree (c.f. 
[7] for a closely related definition). 

Lemma 2.8. For any k > 3, the sequence of finite trees {Tfc(^)}^>o con- 
verges locally to the /c -canopy tree. This random infinite tree, denoted CTk, is 
formed by the union of the infinite ray R = {(r, r-|- 1), r > 0} and additional 
finite trees {Tk-i{r),r > 0} such that Tfc_i(r) is rooted at the r-th vertex 
along R. The root of CTk is on R withF{CJk rooted atr) = {k—2)/{k—iy^^ 
for r > 0. 

Proof. This local convergence is immediate upon noting that there are ex- 
actly Ur = k{k — 1Y~^ vertices at generation r > 1 of Tfc(£), hence |Tfc(£)| = 
[k{k - if - 2]/{k - 2) and n(,_r/\Tk{^)\ P(CTfc rooted at r) as ^ ^ 00, 
for each fixed r > and k >3 (and Bj(£) matches for each i of generation 
£ — r in Tk{t) the ball Br(£) of the A;-canopy tree). □ 

Remark 2.9. Note that the k-canopy tree is not conditionally independent. 

2.2. Ising models on conditionally independent trees 

Following [29] it is convenient to extend the model (1.20) by allowing for 
vertex-dependent magnetic fields Bi, i.e. to consider 

Kx) = ,] . exp \p XiXj + BiXi] . (2.9) 

In this general context, it is possible to prove correlation decay results for 
Ising models on conditionally independent trees. Beyond their independent 
interest, such results play a crucial role in our analysis of models on sparse 
graph sequences. 
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To state these results denote by /i^'" the Ising model (2.9) on T(^) with 
magnetic fields {Bi} (also called free boundary conditions), and by /i^'"*" the 
modified Ising model corresponding to the limit Bi | +00 for all i G dT{i) 
(also called plus boundary conditions), using fi^ for statements that apply 
to both free and plus boundary conditions. 

Theorem 2.10. Suppose T is a conditionally independent infinite tree of 
average offspring numbers bounded by A, as in Definition 2.2. Let 
denote the expectation with respect to the Ising distribution on the subtree ofi 
and all its descendants inT(r) and {x;y) = {xy) — {x){y) denotes the centered 
two point correlation function. There exist A finite and X positive, depending 
only on < i?min < Bmux, Pmax ^nd A finite, such that if Bi < i?max for 
all i £ T(r — 1) and Bi > -Bmin for all i G T(^), then for any r < £ and 

/3 ^ /3max; 

e{ {x0■,x^)i'^}<Ae-^\ (2.10) 

iedT{r) 

If in addition Bi < Smax for all i G J{£—1) then for some C = C(/3maxi ^max) 
finite 

E ll/^^f) - /tmIItv < ^e"^(^"^) E{Cl"^MI} . (2.11) 

The proof of this theorem, given in [29, Section 4], relies on monotonicity 
properties of the Ising measure, and in particular on the following classical 
inequality. 

Proposition 2.11 (Griffiths inequalities). Given a finite setV and param- 
eters J = {Jr, R C V) with Jr > 0, consider the extended ferromagnetic 
Ising measure 

/^j(^) = exp { ^ Jrxr^ , (2.12) 



RCV 



where x £ {+1, —1}^ (ind xr = ^u- Then, for X_ of law //j and any 

A,BQV, 



^j[Xa] = ^ E exp I ^ Jrxr} > , (2.13) 

X RCV 

Ej[Xa] = Cov j{Xa,Xb) > . (2.14) 
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Proof. See [61, Theorem IV. 1.21] (and consult [44] for generalizations of 
this result). 

Note that the measure /x(-) of (2.9) is a special case of /.tj (taking J|j} = 
Bi, J[ij] = P for all G E and Jr = for all other subsets of V). Thus, 
Griffiths inequalities allow us to compare certain marginals of the latter 
measure for a graph G and non-negative (3, Bi with those for other choices 
of G, fj and B^. To demonstrate this, we state (and prove) the following well 
known general comparison results. 

Lemma 2.12. Fixing /3 > and Bi > 0, for any finite graph G = {V, E) and 
A^V let {xa)g = m(xa = 1) — ii{xA = —1) denote the mean of xa under 
the corresponding Ising measure on G. Similarly, for U C V let {xa)ij and 
{xa)ij denote the magnetization induced by the Ising measure subject to free 
(i.e. Xu = 0) and plus (i.e. x„ = +1) boundary conditions, respectively, at all 
u ^ U. Then, {xa)ij < {xa)g < {xa)i} for any A C U. Further, U ^ {xa)ij 
is monotone non- decreasing and U i— > {xa)ij is monotone non-increasing, 
both with respect to set inclusion (among sets U that contain A). 

Proof. From Griffiths inequalities we know that J i-^ E j [Xa\ is monotone 
non-decreasing (where J > J if and only if Jr > for all -R C y). Further, 
{xa)g = ^jo[Xa] where J^-y = Bi, J^i jj = P when (i,j) G E and all other 
values of are zero. Considering 

with r] I— > jy^^ non-decreasing, so is i— > 'Ejr,,u [Xa] - In addition, fijv.u {xu = 
— 1) < C e~^^ whenever u ^ U. Hence, as r/ | oo the measure fJ.jv.u converges 
to /ij subject to plus boundary conditions Xu = +1 for u ^ U. Consequently, 

{xA)G<^jy.u[XA] T {xa)^- 

Similarly, let = Jj^I(-R ^ U) noting that under fiju the random vector 
Xjj is distributed according to the Ising measure fi restricted to Gu (alter- 
natively, having free boundary conditions Xu = for u ^ U). With A <^ U 
we thus deduce that 

{xa)^u = ^ju[Xa] < ^AXa] = {xa)g ■ 

Finally, the stated monotonicity of ?7 ^ {xa)i/ and U ^ {xa)ij are in view 
of Griffiths inequalities the direct consequence of the monotonicity (with 
respect to set inclusions) oiU ^ jy and U i— > J^'^ , respectively. □ 
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In addition to Griffiths inequalities, the proof of Theorem 2.10 uses also 
the GHS inequahty [48] which regards the effect of a magnetic field on 
the local magnetizations at various vertices. It further uses an extension of 
Simon's inequality (about the centered two point correlation functions in 
ferromagnetic Ising models with zero magnetic field, see [82, Theorem 2.1]), 
to arbitrary magnetic field, in the case of Ising models on trees. Namely, 
[29, Lemma 4.3] states that if edge is on the unique path from to 

k E T(^), with j a descendant of i S dT{t), t > 0, then 

{X0;xk)i^^ < cosh2(2/3 + 5,) {x0;x^)^^\xj;xk)f^ . (2.15) 

2.3. Algorithmic implications: belief propagation 

The 'belief propagation' (BP) algorithm consists of solving by iterations a 
collection of Bethe-Peierls (or cavity) mean field equations. More precisely, 
for the Ising model (1.20) we associate to each directed edge in the graph i — > 
j, with S G, a distribution (or 'message') i'i^j{xi) over xi S {+1, — 1}, 
using then the following update rule 

i^-^{xi) = ^e^^^ n E^''''''^S.(^0 (2-16) 
Zi^j iedi\j xi 

starting at a positive initial condition, namely where i^|'^j(+l) > I'l^ji—l) 
at each directed edge. 

Applying Theorem 2.10 we establish in [29, Section 5] the uniform expo- 
nential convergence of the BP iteration to the same fixed point of (2.16), 
irrespective of its positive initial condition. As we further show there, for 
tree-like graphs the limit of the BP iteration accurately approximates local 
marginals of the Boltzmann measure (1.20). 

Theorem 2.13. Assume P > 0, B > and G is a graph of finite maximal 
degree A. Then, there exists A = A{P,B,A) and c = c{l3,B,A) finite, 
A = X{P,B,A) > and a fixed point {z/*_,j} of the BP iteration (2.16) such 

that for any positive initial condition {yf^jf] and all t > 0, 

sup Wufl- - i/*_Jtv < ^exp(-At) . (2.17) 

Further, for any io G V , if Bj^(t) is a tree then for U = Bj^(r) 

IIW - ^uWtv < exp {c"+i - X{t - r)} , (2.18) 
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where fJ-ui') is the law of Xu = {xi : i G C/} under the Ising model (1.20) 
and vu the probability distribution 

z^c/fec/) = ^exp|/3 XiXj + B Yl , (2.19) 

(i,j)&Eu ieu\du iedU 

with Ejj the edge set of U whose border is dU (i.e. the set of its vertices at 
distance r from io), and j{i) is any fixed neighbor in U of i. 

2.4- Free entropy density, from trees to graphs 

Bethe-Peierls approximation (we refer to Section 3.1 for a general introduc- 
tion) , allows us to predict the asymptotic free entropy density for sequences 
of graphs that converge locally to conditionally independent trees. We start 
by explaining this prediction in a general setting, then state a rigorous result 
which verifies it for a specific family of graph sequences. 

To be definite, assume that B > 0. Given a graph sequence {Gn} that 
converges to a conditionally independent tree T with bounded average off- 
spring number, let L = A0 be the degree of its root. Define the 'cavity fields' 
{hi, . . . , hi} by letting hj = lim^^oo ^j*^ with /i^*^ = atanh[(xj)j*^], where 

( • denotes expectation with respect to the Ising distribution on the sub- 
tree induced by j G d0 and all its descendants in J{t) (with free boundary 
conditions). We note in passing that t >—>■ h^p is stochastically monotone 
(and hence has a limit in law) by Lemma 2.12. Further {hi, . . . are 
conditionally independent given L. Finally, define 9 = tanh(/?) and 

L 

h_j = B+ Y atanh[6ltanh(/ifc)] . (2.20) 

k=l,k=/=j 

The Bethe-Peierls free energy density is given by 

1 1 ^ 

^{f3,B) = -E{L}-f{e) - -E{^log[l + 0tanh(/i_,)tanh(/i,)]} 

i=i 

L L 

+ Elog {e^ + Otanh{hj)] + e'^ ]]_[! - 9tanh{hj)]} , (2.21) 

for 7(ti) = — ^log(l — u^). We refer to Section 3.3 where this formula is 
obtained as a special case of the general expression for a Bethe-Peierls free 
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energy. The prediction is extended to < by letting B) = (p{(3, —B), 
and to S = by letting (/?(/?, 0) be the limit of (/?(/?, B) as B ^ 0. 

As shown in [29, Lemma 2.2], when T = T{P, p,oo) is a Galton- Watson 
tree, the random variables {hj} have a more explicit characterization in 
terms of the following fixed point distribution. 

Lemma 2.14. In case T = T(P, p, oo) consider the random variables {h^^^} 
where /i'*^^ = and for t > 0, 

K 

l^{t+i) 4 ^ ^ ^ atanh[0 tanh(/if ^ )] , (2.22) 

i=l 

with hf^ i.i.d. copies of h^^^ that are independent of the variable K of dis- 
tribution p. If B > and p < oo then t h-> is stochastically monotone 
(i.e. there exists a coupling under which P(/i^*) < h^^~^^^) = 1 for all t), and 
converges in law to the unique fixed point h* of (2.22) that is supported on 
[0,oo). In this case, hj of (2.21) are i.i.d. copies of h* that are independent 
ofL. 

The main result of [29] confirms the statistical physics prediction for the 
free entropy density. 

Theorem 2.15. // p is finite then for any B & R, P > and sequence 
{G„}neM of uniformly sparse graphs that converges locally to T(P, p, oo), 

lim - log ZniP, B) = ifiP, B) . (2.23) 

n^oo n 

We proceed to sketch the outline of the proof of Theorem 2.15. For uni- 
formly sparse graphs that converge locally to T(P, p, oo) the model (1.20) 
has a line of first order phase transitions for i? = and (3 > j3c (that is, where 
the continuous function B ip{f3,B) exhibits a discontinuous derivative). 
Thus, the main idea is to utilize the magnetic field B to explicitly break the 
+/— symmetry, and to carefully exploit the monotonicity properties of the 
ferromagnetic Ising model in order to establish the result even at /? > /3c- 

Indeed, since (/>^(/3, i?) = log ^^(/3, 5) is invariant under B — > — B and 
is uniformly (in n) Lipschitz continuous in B with Lipschitz constant one, 
for proving the theorem it suffices to fix i? > and show that (/>„(/?, i?) 
converges as n — > oo to the predicted expression ip{(3,B) of (2.21). This is 
obviously true for /? = since 0„(O, i?) = log(2coshi?) = ip{0,B). Next, 
denoting by {■)n the expectation with respect to the Ising measure on Gn 
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(at parameters (3 and i?), it is easy to see that 

dfiCt>^{(5,B) = ^ J2 {XiXj)n = lEn[Y.{XiXj)n]. (2.24) 

{i,j)&En jedi 

With \dj3(j)n{f3, B)\ < \En\/n bounded by the assumed uniform sparsity, it is 
thus enough to show that the expression in (2.24) converges to the partial 
derivative of ^{(3, B) with respect to 13. Turning to compute the latter deriva- 
tive, after a bit of real analysis we find that the dependence of {hj, h-j} on 
/3 can be ignored (c.f. [29, Corollary 6.3] for the proof of this fact in case 
T = T(P, p, oo)). That is, hereafter we simply compute the partial derivative 
in /3 of the expression (2.21) while considering the law of {hj} and {h-j} to 
be independent of f3. To this end, setting zj = tanh(/ij) and yj = tanh(/i_j), 
the relation (2.20) amounts to 



for which it follows that 

^{^log(l + e.,y,)} = |^log{e^n(l + 0z,) + e-^n(l-0z,)} 

j=i j=i j=i 

I 

_ \ ^ ^ j Uj 

and hence a direct computation of the derivative in (2.21) leads to 

dfsip{f3,B) = l^[Yl (%^i)T] , (2.25) 
where (•)t denotes the expectation with respect to the Ising model 

I2j{x0,xi, . . . ,xl) = - exp |/3^X0Xj + Bx0 + ^ hjxj^ , (2.26) 

on the 'star' T(l) rooted at and the random cavity fields hj of (2.21). 

In comparison, fixing a positive integer t and considering Lemma 2.12 
for A = {i,j} and U = Bj(t), we find that the correlation {xiXj)n lies 
between the correlations {xiXj)^,^^-^ and (xiXj)g^^j for the Ising model on 
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the subgraph Bj(t) with free and plus, respectively, boundary conditions at 
dBi{t). Thus, in view of (2.24) 

iE„{Fo(Bi(t))} < df,MP,B) < ^E„{F+(B,(t))}, 

where Fo/+(Bi(t)) = J2jedii^i^j)%it)- 

Next, taking n — > oo we rely on the following consequence of the local 
convergence of a uniformly sparse graph sequence {Gn} (c.f. [29, Lemma 
6.4] for the derivation of a similar result). 

Lemma 2.16. Suppose a uniformly sparse graph sequence {Gn} converges 
locally to the random tree T. Fix an integer t > and a function F{-) 
on the collection of all possible subgraphs that may occur as Bj(t), such 
that F {Bi{t)) / {\di\ + 1) is uniformly bounded and F{Ti) = F(T2) whenever 
Ti ~ T2 . Then, 

lim En{F{Bi{t))} = E{F{J{t))} . (2.27) 

n — *oo 

Indeed, applying this lemma for the functions Fq{-) and -F+(-) we find 
that 

iE{Fo(T(t))} < limmid(,MP,B) < limsupS^ </.„(/3, S) < ^E{F+(T(t))} . 

Z n— >oo n— >oo ^ 

To compute FQ/^{T{t)) we first sum over the values of Xk for k G T(t) \T(1). 
This has the effect of reducing Fo/_|_(T(t)) to the form of J2j<^d0{^0^j)T and 

the cavity fields are taken as hj^'^^'^ = atanh[(xj)j*^''^^^]. Further, from 
(2.11) we deduce that as t — > 00 both sets of cavity fields converge in law to 
the same limit {hj}. Since E[(x0Xj)T] are continuous with respect to such 
convergence in law, we get by (2.25) that 

lim \E{Fo/+{Jm = dpip{/3,B), 
which completes the proof of the theorem. 

2.5. Coexistence at low temperature 

We focus here on the ferromagnetic Ising model on a random A;-regular 
graph with A: > 3 and zero magnetic field. In order to simplify derivations, 
it is convenient to use the so-called configuration model for random regular 
graphs [17]. A graph from this ensemble is generated by associating k half 
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edges to each i G [n] (with kn even) and pairing them uniformly at random. 
In other words, the collection of m = kn/2 edges is obtained by pairing 
the kn half-edges. Notice that the resulting object is in fact a multi- graph 
i.e. it might include double edges and self-loops. However the number of such 
'defects' is 0(1) as n — > oo and hence the resulting random graph model 
shares many properties with random /c-regular graphs. With a slight abuse 
of notation, we keep denoting by G{k, n) the multi-graph ensemble. 
For /3 > we consider the distribution 

MnAfcfe) = N exp|/3 XiXj]. (2.28) 

Recall Remark 2.7 that any sequence of random graphs G„ from the 
ensembles G{k, n) is almost surely uniformly sparse and converges locally to 
the infinite /c-regular tree. Thus, considering the function 

VkiP. h) =^{7(0) - log[l + 0tanh2(/j)]} 

+ log{[l + 0tanh(/i)]'= + [1 - eia.n\i{h)f] , 

of /i G IR, and 9 = tanh(/?), we have from Theorem 2.15 that 

lim ilogZ(Gn) = v9fc(/?,r), (2.29) 

n— »oo 77, 

where the cavity field h* is the largest solution of 

g{h) = {k- l)atanh[6l tanh(/i)] -h = 0. (2.30) 

Indeed, the expression for (^fc(/3, /i*) is taken from (2.21), noting that here 
L = K+l = kis non-random, hence so are = hj = h* . It is not hard to 
check by calculus that the limit as i3 | of the unique positive solution of 
g{h) = —B is strictly positive if and only if /? > /3c = atanh(l/(A; — 1)), in 
which case g{—h) = —g{h) is zero if and only if /i G {0, it/i*} with g'{0) > 
and g'{h*) < (c.f. [64]). 

We expect coexistence in this model if and only if /3 > /3c (where we 
have a line of first order phase transitions for the asymptotic free entropy 
at B = 0), and shall next prove the 'if part. 

Theorem 2.17. With probability one, the ferromagnetic Ising measures 
fJ'n,f3,k on uniformly random k-regular multi-graphs from the ensemble G{k, n) 
exhibit coexistence if (k — 1) tanh(/3) > 1. 
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Proof. As in the proof of Theorem 1.5 (for the Curie-Weiss model), we 
again consider the partition of X"' to $7+ = {x : J2i — 0} ^- = 
{x : J2iXi < 0}- From the invariance of Hn,i3,k with respect to the sign 
change x i— > —x it follows that /in,/3,fc(^^+) = /"n./3,fc(^o) + t^n,p,k{^-) where 
= {x : J2i^i — '^l- Hence, to prove coexistence it suffices to show that 
for e > small enough, with probability one 

limsup-logj V f^n,l3,k{^r)] < . (2.31) 

n— >cxD n ^11^ 

\r\<ne 

To this end, note that fin,f3,k{^r) = Zr{Gn)/Z{Gn) for the restricted partition 
function 

Z,(G„)= ^ exp{/3 ^i^i}- (2.32) 

Further, recall that by Markov's inequality and the Borel-Cantelli lemma, 
for any positive random variables Yn, with probability one 

lim sup — log Yn < lim sup — log ]E(l"n) • 

n— >oo n n—*oo n 

Thus, combining (2.29) with the latter inequality for Yn = J2\r\<ne^r{Gn) 
we arrive at the inequality (2.31) upon proving the following lemma (c.f. [43, 
Section 5]). 

Lemma 2.18. Considering even values of n and assuming (3 > /3c, we have 
that 

lim lim sup -log I J2 ^Zr{Gn)} = < ^kW,h*). (2.33) 

\r\<ne 

Proof. First, following the calculus preceding (2.25) we get after some alge- 
braic manipulations that 

kO 

dh(pkiP,h) = — [/(tanh(u),c) - /(tanh(/i), c)] 
cosh [hj 

for c = c{h) = 9tanh{h) and u = u{h) = (k — l)atanh(c), where for c > 
the function f{x,c) = x/{l + cx) is monotone increasing in x > 0. With 
/3 > /3c we know already that g{h) > (for g{-) of (2.30)), hence u{h) > h 
for any h E (0, /i*). From the preceding expression for dhipk{l3,h) and the 
monotonicity of /(•, c) we thus deduce that (pk{f3., h*) > ifkif^, 0). 
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Next, since Zr{G) = Z^r{G) we shall consider hereafter only r > 0, setting 
s = (n — r)/2. Further, let Ag'(x) denote the number of edges G E such 
that Xi 7^ Xj and Zr{G, A) be the number of configurations xGQr such that 
Ag(^) = A. Since [E'l = m it follows that j)e£; ^i^j = m — 2Ag(x) and 
hence 

m 

ZriG) = e^™ ^ A) e"2/3A . 

A=0 

By the linearity of the expectation and since the distribution of Gn (chosen 
uniformly from G{k,n)) is invariant under any permutation of the vertices, 
we have that 

E{Zr{Grr,A)} = ^ P{Ag„(x) = A} = P{ A^ Jx*) = A} 
_ (v\ \{G £ (G(fc,n) and Ag(x*) = A}| 

where x* = — 1 for i < s and x\ = \ for s < i < n. 

The size of the ensemble G{k, n) is precisely the number of pairings of nk 
objects, i.e. 

n)\=^{nk)- 



(nA;/2)!2"^V2 ' 

Similarly, the number of such pairings with exactly A edges of unequal end- 
points is 

|{G G G{k,n) and Ag(x*) = A}| = (^^"j f^^) ^' '^(^^ " " ^) ' 

where n = k{n — s). Putting everything together we get that 

E{Zr{Gn)} = 

e^- M ^ fks\ fn\ ^, ^^^^^ _ _ ^ ^2^3^^ 



^(27n)\^s;^^\^A;\^A 
Recall that for any q £ [0,1] 

n^^ log 1^^^^ = /7(g) + o(l) , log ^{n) = ^ log (^) + o(l) , 

imsart-generic ver. 2009/08/13 file: full-version.tex date: October 28, 2009 



Dembo et al./Gibbs Measures on Sparse Random Graphs 40 

where H{x) = —xlogx — (1 — x)log(l — x) denotes the binary entropy 
function. 

Setting A = 6kn, s = un and 

^fsiu, 6) = {u- 6) log(M -6) + {l-u-6) log(l - u - 6) + 26log6 + 4/35 , 

we find upon substituting these estimates in the expression (2.34) that 

logE{Z,(G„)} = ^ + (1 - k)Hiu) - ^ /mi ^^(tx, 6) + o(l) . 

2 2 (5e[0,n] 

Differentiating ipp{u,5) in 5 we deduce that the infimum in the preceding 
expression is achieved for the positive solution 5 = 5^{l3,u) of (u — 6){1 — 
u — 5) = (5^e^^. Using this value of 6 we get that n^-*^ logE{Zr(G„,)} = 
r]k{(3,u) + o(l), where 

%(/3,n) = ^+{l-k)H{u)-'^{ulog{u-6,{P,u))+{l-u)log{l-u-5,{(3,u))} 

Next, note that S^{l3, 1/2) = 1/[2(1 + e^^)] from which we obtain after some 
elementary algebraic manipulations that 1/2) = ipi.(f3,0). Further, as 

r]k{f3,u) is continuous in n > 0, we conclude that 

limsupn^Mog{ ^ EZr(G„)} = sup r]k{l3,u), 

|r|<m |2«-1|<. 

which for e — > converges to r/fc(/3, 1/2) = (pkiP, 0), as claimed. □ 



3. The Bethe-Peierls approximation 

Bethe-Peierls approximation reduces the problem of computing partition 
functions and expectation values to the one of solving a set of non-linear 
equations. While in general this 'reduction' involves an uncontrolled error, 
for mean-field models it is expected to be asymptotically exact in the large 
system limit. In fact, in Section 2 we saw such a result for the ferromagnetic 
Ising model on sparse tree-like graphs. 

Bethe states, namely those distributions that are well approximated within 
the Bethe-Peierls scheme play for mean-field models the role that pure Gibbs 
states do on infinite lattices (for the latter see [42]). For example, it is con- 
jectured by physicists that a large class of models, including for instance the 
examples in Section 1, decompose into convex combinations of Bethe states. 
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In the context of mean field spin glasses, the Bethe-Peierls method was 
significantly extended by Mezard, Parisi and Virasoro to deal with prolifera- 
tion of pure states [70]. In the spin glass jargon, this phenomenon is referred 
to as 'replica symmetry breaking,' and the whole approach is known as the 
'cavity method'. A closely related approach is provided by the so-called TAP 
(Thouless- Anderson-Palmer) equations [70]. 

Section 3.1 outlines the rationale behind the Bethe-Peierls approximation 
of local marginals, based on the Bethe mean field equations (and the belief 
propagation algorithm for iteratively solving them). Complementing it. Sec- 
tion 3.2 introduces the Bethe free entropy. In Section 3.3 we explain how 
these ideas apply to the ferromagnetic Ising, the Curie- Weiss model, the 
Sherrington-Kirkpatrick model and the independent set model. Finally, in 
Section 3.4 we define a notion of correlation decay which generalizes the so 
called 'extremality condition' in trees. We show that if the graphical model 
associated with a permissive graph-specification pair (G, "0) satisfies such 
correlation decay condition then it is a Bethe state. Subject to a slightly 
stronger condition, [30] validates also the Bethe-Peierls approximation for 
its free entropy. 

While in general extremality on the graph G does not coincide with ex- 
tremality on the associated tree model, in Section 5 we shall provide a suf- 
ficient condition for this to happen for models on random graphs. 

3. 1 . Messages, belief propagation and Bethe equations 

Given a variable domain X and a simple finite graph G = (V, E) without 
double edges or self loops, let E = {i j : £ E} denote the induced 

set of directed edges. The Bethe-Peierls method provides an approximation 
for the marginal on [/ C 1^ of the probability measure = fiG,i> cf. Eq. (1-4). 
The basic idea is to describe the infiuence of the factors outside U via fac- 
torized boundary conditions. Such a boundary law is fully specified by a 
collection of distributions on X indexed by the directed edges on the 'inter- 
nal' boundary dU = {i £ U : di U} of U (where as usual di is the set 
of neighbors of i S V). More precisely, this is described by appropriately 
choosing a set of messages. 

Definition 3.1. A set of messages is a collection {fj^j( • ) : i ^ j £ E} of 

probability distributions over X indexed by the directed edges in G. 

A set of messages is permissive for a permissive graph- specification pair 
{G,ip) if Vi-,j{x^) are positive and further i>i^j{-) = 'ipi{-)/zi whenever 
di = {j}. 
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As we shall soon see, in this context the natural candidate for the Bethe- 
Peierls approximation is the following standard message set. 

Definition 3.2. The standard message set for the canonical probability mea- 
sure fi associated to a permissive graph-specification pair {G, ip) is i^*_^j{xi) = 

Hi '{xi), that is, the marginal on i of the probability measure on 

ti^''Hx) = ^ n i^ki{xk,xi)Y[Mxk), (3.1) 

{k,l)eE\(i,j) k&V 

obtained from equation (1-4) upon 'taking out' the contribution of 
edge {i,j) (and with Zij an appropriate normalization constant). 

Remark 3.3. Since is permissive, the measure fj,^^^\-) is well defined and 
strictly positive at x = (x^, . . . , xP). Further, the marginal on i of fJ.^^^\-) is 
precisely 'ipi{xi)/J2x'^i{x) whenever di = {j}, so the collection is 
indeed a permissive set of messages (per Definition 3.1). 

In order to justify the Bethe-Peierls method let /u'^*^( • ) denote the proba- 
bility measure obtained from the canonical measure of (1.4) when the vertex 
i £ V and all edges incident on i are removed from G. That is, 

f^^'\x) = y n Mxk,Xl) n Mxk)- (3.2) 

* (k,l)eE,i^{k,l) k&V,k^i 

For any [/ C y we let (respectively, /^[/^)> denote the marginal 

distribution of Xu = {xi : i £ U} when x is distributed according to fj, 
(respectively Z^^*^)- 

Clearly, finding good approximations to the marginals of the modified 
models fJ.^'^^\ /i^*^ is essentially equivalent to finding good approximations for 
the original model fi. Our first step consists of deriving an identity between 
certain marginals of fi^^^^ in terms of marginals of fi^^\ Hereafter, we write 
/(.) = g(-) whenever two non- negative functions / and g on the same domain 
differ only by a positive normalization constant. By definition we then have 
that 

l^ij\xi,Xj) = Mxi) J2 l^dh^di) n Mxi,xi). (3.3) 
To proceed, we let v*^j{xi) = ^xi^\xi) and make the crucial approximate 
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independence assumptions 

MSf(x„x,) = vUj{x,)v*^^{x,) + ^RR, (3.4) 
^ifMdi) = n'^r-.^(^/) + ERR, (3.5) 

where the error terms ERR are assumed to be smah. Indeed, upon neglecting 
the error terms, plugging these expressions in equation (3.3), setting Xj = 
and dividing by the positive common factor u*_^^[xj), we get the following 
Bethe equations. 

Definition 3.4. Let M{X) denote the space of probability measures over X 
and consider the Bethe ( or belief propagation, BP ) mapping T of the space 
M{X)'^ of possible message sets to itself, whose value at v is 

ijv)i^j{xi) = ^'^^'^ l^"^ ipii{xi,xi)ui_i{xi) , (3.6) 

l£di\j Xi&X 

where determined by the normalization condition ^^^p^{Jv)i^j{x) = 

1. The Bethe equations characterize fixed points of the BP mapping. That 
is, 

Ui^jixi) = {Jv)i^j{xi) . (3.7) 

Remark 3.5. The BP mapping T is well defined when the specification ^ is 
permissive. Indeed, in such a case there exists for each i ^ j (z E and any 
message set v, a positive constant Zi^j > V'mii for which (T^')i_>j G A4{X). 

Moreover, in this case by definition (Tiy)i^j (x) is positive at x = xf and 
further, equals ipi{x)/zi whenever di = {j}. In particular, any solution of 
the Bethe equations is a permissive set of messages. 

These equations characterize the set of messages {f*_^j(-)} to be used in 
the approximation. Bethe-Peierls method estimates marginals of the graph- 
ical model /^cV' ^ manner similar to that expressed by (3.4) and (3.5). 
For instance, /Zj(-) is then approximated by 

lii{xi) = tpi{xi) Jl '^tpij{xi,Xj)iy*_^i{xj) . (3.8) 

A more general expression will be provided in Section 3.4. 

At this point the reader can verify that if G is a (finite) tree then the 
error terms in equations (3.4) and (3.5) vanish, hence in this case the Bethe 
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equations have a unique solution, which is precisely the standard message 
set for the canonical measure fi. More generally, it is not hard to verify 
that in the framework of a (permissive) specification ip for a factor graph 
G = {V, F, E) the Bethe equations are then 

-£Sa\i leda\i 

and that when the factor graph is a (finite) tree these equations have a 
unique solution which is precisely the standard message set for the (canon- 
ical) measure ^XG,ti>{') of (1.23). That is, fi_>a(-) and i'a^i{-) are then the 
marginals on variable i for factor graphs in which factor a and all factors in 
di\a are removed, respectively. 

In view of the preceding, we expect such an approximation to be tight 
as soon as G lacks short cycles or for a sequence of graphs that converges 
locally to a tree. 

3.2. The Bethe free entropy 

Within the Bethe approximation all marginals are expressed in terms of 
the permissive messages {fj^j} that solve the Bethe equations (3.7). Not 
surprisingly, the free entropy logZ(G, V') can also be approximated in terms 
as the Bethe free entropy at this message set. 

Definition 3.6. The real valued function on the space of permissive message 
sets 

^G^fil^) = - XI "^Og^^ 'lpij{Xi,Xj)Vi^j{Xiyj^i{Xj)^ 

is called the Bethe free entropy associated with the given permissive graph- 
specification pair {G,tp)- In the following we shall often drop the subscripts 
and write <^(i^) for the Bethe free entropy. 

In the spirit of the observations made at the end of Section 3.1, this 
approximation is exact whenever G is a tree and the Bethe messages are 
used. 
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Proposition 3.7. Suppose G is a tree and let u* denote the unique solution 
of the Bethe equations (3.7). Then, log t/;) = (^G,ip{^*)- 

Proof. We progressively disconnect the tree G in a recursive fashion. In doing 
so, note that if f{x) = fi{x)f2{x)/fz{x) and fa{x) = p{x) for a G {1,2,3} 
and some probability distribution p, then 

log { f{x)] = log { 5^ + log { 5] /2(x)} - log { ^ /3(X)} (3.10) 

X X X X 

(adopting hereafter the convention that 0/0 = 0). 

Proceeding to describe the first step of the recursion, fix an edge (i, j) G E. 
Without this edge the tree G breaks into disjoint subtrees G^*^ and G^^^ such 
that i E G^*^ and j E G^-'-*. Consequently, the measure /i^*-'-*(-) of (3.1) is then 
the product of two canonical measures, corresponding to the restriction of 
the specification to G*-*^ and to G^^\ respectively. Let Zi^j{x) denote 
the constrained partition function for the specification ijj restricted to the 
subtree G*-*^ whereby we force the variable Xi to take the value x. With 
Zj^i{x) defined similarly for the subtree G^^\ we obviously have that 

Z{G,-ip) = Zi^j{xi)ipij{xi,Xj)Zj^i{xj) . 

Further, recall our earlier observation that for a tree G the unique solution 
{i^i^ji')} of (3.7) is {/^f'' ''(•)}• Hence, in this case Zi^j{xi) = iy*^j{xi), 
Zj^i{xj) = i^*^i{xj) and i^*^j{xi)ipij{xi, Xj)i'*^i{xj) = iJ,ij{xi,Xj). Setting 
i/^i^jixi, Xj) = i'*^j{xi)ijjij{xi, Xj) we next apply the identity (3.10) for x = 
{xi,Xj), fi{x) = Zi^j{xi)ip*^i{xi,Xj), f2{x) = ip*^j{xi,Xj)Zj^i{xj) and 
/3(x) = u*__^j{xi)ilJij{xi, Xj)u*^i{xj) to get that 

logZ(G,V^) = log Z{G^'-*^\±^'^^^)+ log Z{G^^-*'\i/-^-'''>)- log ^{i,j), 
where for each edge {i,j) E E, 

"Pihj) = Y '^UjiXi)'^ijiXi,Xj)u*^i{Xj) 

and the term 

^(G'(^-.-)^^(-i))^ J2 Z,^,{x,)rj^,{xi,x,), 

is the partition function for the (reduced size) subtree G^*"*-'-' obtained when 
adding to (G^'\ V''*^) the edge (i,j) and the vertex j whose specification is 
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now = i^j^i- We have the analogous representation for 

X i , Xj 

It is not hard to verify that the unique solution of the Bethe equations 
(3.7) for the graph-specification (G^'^^^\ip^^^^^) coincides with at all 
directed edges of G^^^^\ Likewise, the unique solution of the Bethe equations 
(3.7) for the graph-specification [G^^^^\il:^^^^^) coincides with v*{-) at all 
directed edges of G'--'^*-' . Thus, recursively repeating this operation until we 
have dealt once with each edge of G, we find a contribution — log '^{k, I) from 
each [k, I) S E, the sum of which is precisely the first term in (3.9), evaluated 
at the permissive set of messages The residual graph remaining at 

this stage consists of disconnected 'stars' centered at vertices of G, with 
specification ^^^^.{xi) at vertices I G dk for the 'star' centered at A; S F 
(and original specification at vertex k and the edges {k^l) G E). The log- 
partition function for such star is log { Y^x^ ^k{xk) O/eafc Ex, i'^i^ki^h Xk)] 
so the aggregate of these contributions over all vertices of G is precisely the 
second term in (3.9), evaluated at □ 

Lemma 3.8. Solutions of the Bethe equations (3.7) for a given permis- 
sive graph- specification pair (Gjijj) are stationary points of the correspond- 
ing Bethe free entropy ^c.^pi')- The converse holds when the \X\- dimensional 
matrices {'0,j(x, y)} are invertihle for all {i,j) G E. 

Proof. From the formula (3.9) and our definition (3.6) we find that for any 
j ^ i £ E and xj £ X, 

d^ju) _ _ Ex, i'i^j{xi)ipij{xi,Xj) 

dVj^i {Xj ) J2x'^ Ui^j {x'i ) l^j^i {x'j)i'ij {x'i , x'j ) 

^ Y.x,0''^)i^jixi)'4'ijixi,Xj) 

Hence, if (•)} satisfies the Bethe equations (3.7), then d^{iy) /di^j^i^x) = 

for all X G ^ and any j — > z G -E, as claimed. 

Conversely, given a permissive specification, if a permissive set of messages 
v is a stationary point of ^{■), then by the preceding we have that for any 

1 ^ j £ E, some positive Cj_>j and all y £ X, 

- Ci-,jUi^j{x)]tlJij{x,y) = 0. 

X 
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By assumption the matrices {'ipij{x,y)} are invertible, hence Ui^j{x) = 
{Jv)i^j{x) for any i ^ j & E. The probabihty measures Vi^j and {Ji')i^j 
are thus identical, for each directed edge i — > j. That is, the set of messages 
1/ satisfies the Bethe equations for the given specification. □ 



3.3. Examples: Bethe equations and free entropy 

In most of this section we consider the extension of the Ising measure (2.9) 
on {+1, —1}^, of the form 

fJ'l3,BAx) = ^ exp {/? Y. Jij^i^j + ^^^i} ' (3-11) 

where J = {Jij,ii,j) G E} for generic 'couphng constants' Jij S R as in 
the spin-glass example of (1.21). This model corresponds to the permissive 
specification ipij{xi,Xj) = ex.p{l3JijXiXj) and tpi^Xi) = exp{BiXi). Since X = 
{+1,-1}, any set of messages {vi^j} is effectively encoded through the 
'cavity fields' 

h^,^-lo^-—^^. (3.12) 

Using these cavity fields, we find the following formulas. 

Proposition 3.9. The Bethe equations for the cavity fields and the measure 
fJ'(},B,j{-) are 

hi^j = Bi + ^ atanh {6*4; tanh(/i;^j)} , (3.13) 

iedi\j 

where On = tanh(/?Jj;). The expected magnetization (xj) for this measure is 
approximated (in terms of the Bethe cavity fields h*^j), as 

(xj) = tanh {Bi + J2 atanhj^ii tanh(/if_^j)}} , (3.14) 

and the Bethe free entropy of any permissive cavity field h = {hi—,j} is 
'^G,i3,B,j{h) = ^Y.Y1 - log [1 + Oij tanh(/ii^j) t&nh{hj^i)] } 

i&V j&di 

(3.15) 

+ ^ log {e^> n [1 + % tanh(/ij^,)] + e^^' H " % tanh{hj^,)]} , 
where ^{u) = — | log(l — u^). 
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Proof. Expressing the BP mapping for the Ising measure /u(x) = H/3^b,j{x) 
in terms of cavity fields we find that 

{Jh)i^j{xi) = Yl cosh{hi^i + pJiiXi) 

for some positive normalization constants Thus, the identity 

~ log TT — ~~7T = atanh(tanh(a) tanh(6)) , 

2 cosh (a — b) 

leads to the formula (3.13) for the Bethe equations. The approximation (3.8) 
of local marginals then results with /Xj(xj) = e^'^* Dieai cosh(/i^_^j + pjuXi), 
out of which we get the formula (3.14) for (xj) = //j(+l) — //«(— 1) by the 
identity ^ log(a) = atanh(|^). Next note that ii u = tanh(6) then 7(m) = 
logcosh(fe) and recall that by definition, for any £ E and x £ X, 



eMxh^ 

^^^*^^^-2cosh(/i,^,^- ^"^-^^^ 



Hence, using the identity 



4 ^.^^^^^^ cosh(a)cosh(5)cosh(c) = ' + '^""^'^^^ '""^^^^ ^^^^^^^ ' 

the first term in the formula (3.9) of the Bethe free entropy $(•) is in this 
case 

- ^i^ij) ~ log [1 + %tanh(/ii_j)tanh(/ij_i)] . 

Similarly, using (3.16) and the identity 



o 5Z — TT-\ TTIT = 1 + y tanh(a) tanh(6) , 

x-e{+t-i} ^^^^^^> cosh(6) 



for y = Xi £ {+1, —1}, we find that the second term in the formula (3.9) is 
in our case 

E E 7(^.-) 

i&V jedi 

+ Y log {e^^ n [1 + % tanh{hj^i)] + e~^' % tanh(/ij^i)]} • 
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Combining the preceding expressions for the two terms of (3.9) we arrive at 
the formula of (3.15). □ 

We proceed with a few special models of interest. 

The Curie- Weiss model. This model, which we already considered in 
Section 1.1, corresponds to G = Kn (the complete graph of n vertices), with 
Bi = B and = 1/n for all 1 < i 7^ j < n. Since this graph-specification 
pair is invariant under re-labeling of the vertices, the corresponding Bethe 
equations (3.13) admit at least one constant solution h*^- = h*{n), possibly 
dependent on n, such that 

h*{n) = B + {n- l)atanh{tanh(/3/n) tanh(/i*(n))} . 

These cavity fields converge as n — > 00 to solutions of the (limiting) equation 
h* = B -\- /3tanh(/i*). Further, the Bethe approximations (3.14) for the 
magnetization m{n) = (xi) are of the form m(n) = tanh(/i*(n)) + 0(l/n) 
and thus converge as n ^ 00 to solutions of the (limiting) equation m = 
tanh(i? + /?m). Indeed, we have already seen in Theorem 1.4 that the Curie- 
Weiss magnetization (per spin) concentrates for large n around the relevant 
solutions of the latter equation. 

Ising models on random A;-regular graphs. By the same reasoning as 
for the Curie- Weiss model, in case of a fc-regular graph of n vertices with 
Jij = +1, and Bi = B, the Bethe equations admit a constant solution 
hi^j = h* such that 

h* = B + {k- l)atanh{6ltanh(/i*)} , 

for = tanh(/?), with the corresponding magnetization approximation m = 
tanh [B + fcatanh{0 tanh(/i*)}) and Bethe free entropy 

n-^^n{h*) = ^{-f{9) - log [1 + 0tanh\h*)] } 

+ log {e^[l + 9tanh{h*)f + e'^fl - tanh(/i*)]^'} . 

Ising models on fc-regular trees. It is instructive to contrast the above 
free entropy with the analogous result for rooted A;-regular trees (i) ■ From 
Proposition 3.7 we know that the free entropy logZ^(i?,/3) for the Ising 
measure on the finite tree Tii.{i) is precisely the Bethe free entropy of (3.15) 
for the unique solution of the Bethe equations (3.13) with Jij = +1 and 
Bi = B. 
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We denote by rit the number of vertices at generation t G {0, . . . ,i} (thus 
TiQ = 1 and nt = k{k — for t > 1), and by 

n{i) = \Tk{i)\=k{{k-lY -l)/{k-2), 

the total number of vertices in Tk (i) ■ Due to symmetry of (i) , the Bethe 
cavity field assumes the same value hr on all directed edges leading from a 
vertex at generation ^ — r to one at generation i — r — 1 of Tk{i). Thus, we 
have 

h^ = B + {k- l)atanh(6l tanh /i^-i) , (3.17) 

with initial condition h-i =0. Similarly, we denote by h^. of the Bethe cavity 
field on the n£_j. directed edges leading from a vertex at generation ^ — r — 1 
to one at generation i — r. We then have 

hi = B + {k — 2)atanh(0tanh/ir) + atanh(0 tanh , 

for r = £ — 1,£ — 2, . . . ,0, with initial condition = The (Bethe) free 
entropy is in this case 

log Ze{B, (3) = {n{t) - 1)7((9) - ^ n^.^ log [l + 9 tanh hr tanh h^] 

r=0 

I 

+ ^n£_,.log|e^[l + 6'tanh/ir_i]'^'"^[l + 0tanh /if] 

r=0 

+ e^-^[l - etauhhr^i]^^^[l - tanh /if]} . 

Using the relation (3.17) you can verify that the preceding formula simplifies 
to 

logZ,(i?,/3) = (n(£)- 1)7(0) 

+ log{e^[l + 6ltanh/i(,_i]*^ + e--^[l - 6'tanh/i^_i]'=} 

i-i 

+ ^n^_^ log {e^ [1 + tanh /i^,_i]*^-^ + e~-^[l - tanh /i,._i]'=-^} . 

r=0 

The ^ — > oo limit can then be expressed in terms of the A;-canopy tree CT^ 
(c.f. Lemma 2.8). If R denotes the random location of the root of CT^, then 
we get 

lim J^logZKi?,/3) = 

7(0) +E log {e^[l + tanh /ifl_i]'=-^ + e-^[l - tanh /i^_i]'^^^} . 
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Locally tree-like graphs. Recall Remark 2.7, that fc-regular graphs con- 
verge locally to the Galton- Watson tree T{P, p, oo) with = 1. More gener- 
ally, consider the ferromagnetic Ising model /i/3,_B(x) of (1.20), namely, with 
Jij = +1 and Bi = B, for a uniformly sparse graph sequence {Gn} that 
converges locally to the random rooted tree T. Then, for any n and cavity 
field h = {hi^j} we have from (3.15) that 

n-^^n{h) = \^n[Yl - log[l + etanh(/ii^j)tanh(/i,-^,)]} 

jedi 



log {e^ n [1 + Otanh{hj^i)] + e"^ J] - Otanh{hj^i)]} 



where E„ corresponds to expectations with respect to a uniformly chosen 
i £ Vn- For n — > oo, as shown in Lemma 2.16 we have by local convergence 
and uniform sparsity that these expectations converge to the corresponding 
expectations on the tree T rooted at 0. Consequently, we expect to have 

1 ^ 

lim^n-^^nm = 2^[E - log[l + etanh(/i;_^.)tanh(/i*_J]} 



L L 



+ E 



log {e^ + OtanHh*^,)] + e"^ " dtanh{h*^,)]} 
i=i j=i 



where L = \d0\, the variables {ta.nh.(h'j_^^)} are the limit as t — > oo of the 

Ising magnetizations {xj)^p on the sub-trees of j G d0 and all its descendants 
(in T(t), either with free or plus boundary conditions), and for j = 1, . . . , L, 

L 

hl^j = B + atanh{6ltanh(/i^^0)} . 

Indeed, this is precisely the prediction (2.21) for the free entropy density of 
ferromagnetic Ising models on such graphs (which is proved in [29] to hold 
in case T is a Galton- Watson tree). 

The Sherrington-Kirkpatrick modeL The Sherrington-Kirkpatrick spin- 
glass model corresponds to the complete graph Gn = Kn with the scaling 
j3 — > j3/y/n, constant Bi = B and Jij which are i.i.d. standard normal ran- 
dom variables. Expanding the corresponding Bethe equations (3.13), we find 
that for large n and any 

B 1 
hi^j = B + ^ Jiiiauh{hi^i) + o{^) . (3.18) 
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Similarly, expanding the formula (3.14), we get for the local magnetizations 
rrii = {xi) and large n that 

atanh(mj) = hi-^j + tanh(/i,_»j) + o(—=) = hi^j + m,- + oi—=) . 



Substituting this in both sides of equation (3.18), and neglecting terms of 
0(n~^/^) yields the so-called TAP equations 

atanh(mi) = S + — ^ Jumi - rrii — ^ J^(l-mf). (3.19) 

^^l=i,l^i ^ i=i,i^i 

The independent set model. In this model, which is not within the 
framework of (3.11), we consider the measure 

/^g,a(^) = ^7^aI^I n I((^i,x,)/(l,l)), (3.20) 

where \x\ denotes the number of non-zero entries in the vector x E {0, 1}^. 
It corresponds to the permissive specification ijjij{x,y) = I((x,y) ^ (1,1)), 
and ilJi{x) = A^', having = for all i £ V.ln this case the Bethe equations 
are 

1 



1 + A Uieai\j ^i-^i 

for I'i^j = fi^j(O) and their solution {z^*_^j} provides the approximate den- 
sities 

a(x- - 1) - ^^^^di^Ui 
and the approximate free entropy 

cI>(r.*) = 5]log{l + An^;^.}- E ^og\uU^+u*^,-uU^u*^^. 

i&V j£di {i,j)£E 



3.4- Extremality, Bethe states and Bethe-Peierls approximation 

Following upon Section 3.1 we next define the Bethe-Peierls approxima- 
tion of local marginals in terms of a given set of messages. To this end, 
recall that each subset U 'ZV has a (possibly infinite) diameter diam([/) = 
max{d(z,j) : i,j S U} (where d{i,j) is the number of edges traversed in 
the shortest path on G from i £ V to j £ V), and it induces the subgraph 
Gu = {U,Eu) such that Eu = £ E £ U}. 
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Definition 3.10. Let lA denote the collection of U ^ V for which Gjj = 
{U,Ei/) is a tree and each i € dU is a leaf of Gu (i.e. \dinU\ = 1 whenever 
i G dU ). A set of messages {ui^j} induces on each U £ U the probability 
measure 

^u{xu) = Yiil^ii^i) n '4'ij{xi,Xj) , (3.21) 

where ip*{-) = except for i G dU in which case = i^i^u{i)i') with 
{u{i)} =dinU. 

A probability measure p{x) on is (e,r)-Bethe approximated by a set 
of messages if 

sup ll/Of/ - z^c/IItv < e, (3.22) 

!7eW,diam(C/)<2r 

where pu{ ■ ) denotes the marginal distribution of Xjj under p(-). We call any 
such p(-) an (e, r)-Bethe state for the graph- specification pair {G,ip). 

Remark 3.11. Note that if i ^ dU is a leaf of an induced tree Gjj then 
di = {u{i)} and if is a permissive set of messages then ^'i_>u(i)(") — 

Consequently, in (3.21) we may and shall not distinguish between dU 
and the collection of all leaves of Gjj. 

We phrase our error terms and correlation properties in terms of valid 
rate functions, and consider graphs that are locally tree-like. Namely, 

Definition 3.12. A valid rate function is a monotonically non-increasing 
function 5 : IN — > [0, 1] that decays to zero as r ^ oo. By (eventually) 
increasing 5 (r), we assume, without loss of generality, that 6{r > 5^:5{r) 
for some positive 6^ and all r G fi. 

Given an integer R > we say that G is R-tree like if its girth exceeds 
2R +1 (i.e. Bi{R) is a tree for every i £ V). 

We show in the sequel that the Bethe approximation holds when the 
canonical measure on a tree like graph satisfies the following correlation 
decay hypotheses. 

Definition 3.13. A probability measure p on is extremal for G with 
valid rate function 5{-) if for any A,B(^V, 

WpaM - ^ ■) - Pa{-)pb{-)\\tv < S{d{A,B)) , (3.23) 

where d{A, B) = min{d(i, j) : i £ A, j £ B} is the length of the shortest path 
in G between A QV and B '^V . 
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We consider the notions of Bethe measure and extremality for general 
probability distributions over (and not only for the canonical measure 
f^G,ipi ■ ))• The key (unproven) assumption of statistical physics approaches is 
that the canonical measure (which is ultimately, the object of interest), can 
be decomposed as a unique convex combination of extremal measures, up to 
small error terms. This motivates the name 'extremal'. Further, supposedly 
each element of this decomposition can then be treated accurately within 
its Bethe approximation. 

Here is the first step in verifying this broad conjecture, dealing with the 
case where the canonical measure HG,ip{ " ) is itself extremal. 

Theorem 3.14. Let ip be a permissive specification for an R-tree like graph 
G and a valid rate function. If Hg,-4>{') is extremal with rate 5{ • ) then it 
is {e,r)-Bethe approximated by its standard message set for e = exp{c^)5{R— 
r) and all r < R — 1, where the (universal) constant c depends only on 
5* , K and the maximal degree A > 2 of G. In particular, HG.tpi " ) then an 
{e,r)-Bethe state for this graph- specification pair. 

To prove the theorem, recall first that for any probability measures pa on 
a discrete set Z and f : Z [0, /max] we have the elementary bound 

11^ ^11 ^ 3/max II II nA\ 

Pi-P2Tv<777 ttWPi-P^Wtv, (3.24) 

where pa{z) = pa{z)f{z)/{paj) and {pa, f) = Hzaz Pa{z)f{z) (c.f. [29, 
Lemma 3.3]). Further, it is easy to check that if p{-) = PG,ti>{') ^-^id {G,^) 
is a permissive graph-specification pair, then for any C C 1/, 

pc{x^) > A'-I^U^I^I. (3.25) 

In addition, as shown in [30, Section 3], for such /u(-), if Gw is a tree, 
{i,j) £ Ei/i and j ^ dU' , then 

\\pf\Ai-kA) -/^l|?(-|yA)llTv < b\\pij\A{-\x^) - pij\A{-\y^)\yy , (3.26) 

for b = 2\X\n~^^^^^ and all x,y £ . Finally, the following lemma is also 
needed for our proof of the theorem. 

Lemma 3.15. If the canonical measure p for 2 -free like graph and a permis- 
sive specification is extremal of valid rate function 5{-) then for some finite 
K = K{\X\,K,A) and any AC V 

\\p^f -pA\\TV<I<S{d{{i,j},A)) . 
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Proof. Set B = diUdj\{i,j} and C = IJies 91 noting that \B\ < 2(A- 1), 
\C\ < 2A(A — 1) and since G is 2-tree like, necessarily the induced subgraph 
Gb lias no edges. Hence, 

IJ'b{xb) >fJ-cix^)f^B\cixBkc) 

i^B^^'i^i^^i^^kedi'>Pikixi,x%)J 

so by the bound (3.25) we deduce that UBixB) > cq for all Xb and some 
positive Co = co(|^V|,k, A). Next assume, without loss of generality, that 
^ n S = 0. Then 

Mi) „.ii -'^X^l^.M, 



WfJ-A -/^aIItv = 2H|H/^B {xB)^iA\B{xA\xB) -J2 ^^B{xB)^iA\B{xA\x'B) 
Ha Hb x'g 

< sup \\f^A\B{-\xB) - f^A\B{-\xB)\\TV 

< \E{\\f,A\Bi ■ \X^B^) - MBi ■ \X^B^)\\tv} , 

Cq 

where X^^^ and X^^) are independent random configurations, each of distri- 
bution fi. Next, from the extremality of //(•) we deduce that 

• \K^b'^) - MBi ■ I^2^)IItv} < 25{d{A, B)) , 
so taking K = 2/cg we arrive at our thesis. □ 



Proof of Theorem 3.14- Fixing r < R — 1, a permissive graph-specification 
pair (G, ip) that is extremal for i?-tree like graph G with valid rate function 
6{ • ) and~[/ G U with diam(C/) < 2r, let Ur' = {k e V : d{k, U) > R'} for 
R' = R - r > 1. Note that 

IIW(-) - 1'C/(-)IItv < mfJ-ui-) -^[/|t7^,(- |Zc7^,)||tv 

+ E||/z^l^^,(-|ljj^,)-i/t/(-)llTv, (3.27) 

where vu corresponds to the standard message set (i.e. Ui^j = fif''^ for 
the measure of (3.1)), and the expectation is with respect to the 

random configuration 2L of distribution fi. The first term on the right side is 
precisely H/U^/^ ,( • , • ) ~ f^u{ ■ )a'(7 ,( ' )IItv which for /x(-) extremal of valid 
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rate function 6{-) is bounded by 6{d{U,U r')) = 6{R — r). Turning to the 
second term, consider the permissive set of messages 

where Bj(t) denotes the cohection of vertices of distance at least t from 
i. Since diam([/) < 2r there exists io £ V such that U C Bj^(r) and as 
Bj^(i2) is a tree, the canonical measure for Bj^(i?) \ Gjj is the product of 
the corresponding measures for the subtrees rooted at i € dU. Noting that 
V\Bi^{R) C Uri, it is thus not hard to verify that we have the representation 

l^u\Uj,,(^u\^Tf^,) = ^ Yli^Uxi) n , (3.28) 

as in (3.21), corresponding to the messages (i.e. with tp*{-) = V'i(') 

except for i £ dU in which case ipi{-) = Vi^u{i){'))- Consequently, we proceed 
to bound ||z?{7 — z/[/||tv by applying the inequality (3.24) for the function 

f{xu)= n '^ii^i) n '^ij{xi,Xj) 

on Z = and probability measures pa that are uniform on X^"^^^ with 
Pi{x3u) = Y\i<^dU ^i-*u{i){xi) and P2{xqu) = Oieaf/ ^^^'^ 
recah that f{xu) < /^ax = V'max for M = \U\ - \dU\ + \Eu\. Further, since 
Gu is a tree (hence \Eij\ < \U\), and V is a permissive specification (also 
when is removed from E), upon applying (3.25) for |C| = 1, we have 
that 

n n M^l^'J)I[-^^ui^){x^) 

i&U\dU I ' {ij)&Eu i&dU 

>f |i-|-|f^LA^+A|at/| > r -\u\ 

where ci = is a finite constant. Consequently, we deduce upon 

applying (3.24) that 

\\l^U\URi'\^UR) -^u{-)\\tv = \\P2 -Plllxv < 2c^^'||/5l - P2\\tv 

< 2c[^' \Wi-,u{i) - i'i-^uii)\\Tv ■ (3.29) 
iedu 

Following [30] we show in the sequel that 

E{\\ui^^^i) - ?i^„(i)||Tv} < C25{R - r) , (3.30) 
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for some finite C2 = C2(|A'|, A, k, (5^,) and all i G dU. As \dU\ < \U\ < 
|Bjo(r)| < A^"*"-^, we can choose c = c{\?t!\, A, k,6^) finite such that 1 + 
2c^^'|9C/|c2 < exp(c^). Then, combining the inequalities (3.27), (3.29) and 
(3.30) results with 

WfJ'U - i^uWtv < exp{d')5{R - r) , 

for every U G U of diam(C/) < 2r and r < R — 1, which is the thesis of 
Theorem 3.14. 

As for the proof of (3.30), fixing i € dU let A = Bi{R') and u^^^j = 
where X' of distribution /i^*-') is independent of X. Then, 

E{||fi_j -z?i_j||Tv} = E{\\Eu-_j -z?j_j||Tv} 

<E{||i/^^. -J;,_,||tv}. (3.31) 

Further, setting U' = Bi{R') note that Giji is a tree (since G is i?-tree like), 
such that dU' C A (while di and A are disjoint). Thus, from (3.26) we have 
that for any j £ di, 

iWi^j - t'i^jllTV = ( ■ \^'a) ~ A*i|A ( ■ I^a)IItv 

< b I |/i,,|A( • 1^:4) - ^i,-|^( • |1^)| Itv . (3.32) 

Taking the expectation with respect to the independent random configura- 
tions X' (of law /i^*-')) and X_ (of law fx), leads to 

E{||Atij|A(- I^a) -^ii|A(- I^a)IItv} 

< 2||At{ij},A - At{ii}ml|TV + II^A^^ - /^aIItv ■ 

For fi extremal of valid rate function 6{-) the latter expression is, due to 
Lemma 3.15, bounded by (2 + K)5{R' - 1) < (2 + K)5{R - r)/6^, which 
together with (3.31) and (3.32) results with (3.30). □ 



4. Colorings of random graphs 

Given a graph G = {V, E), recall that a proper g-coloring of G is an assign- 
ment of colors to the vertices of G such that no edge has both end-points 
of the same color. Deciding whether a graph is q^-colorable is a classical NP- 
complete constraint satisfaction problem. Here we shall study this problem 
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when G is sparse and random. More precisely, we shall consider the uniform 
measure Hoi') over proper g-colorings of G, with q > 3. 

As the average degree of G increases, the measure fici " ) undergoes several 
phase transitions and exhibits coexistence when the average degree is within 
a certain interval. Eventually, for any q, if the average degree is large enough, 
a random graph becomes, with high probability, non g-colorable. Statistical 
physicists have put forward a series of exact conjectures on these phase 
transitions [57, 58, 78], but as of now most of it can not be rigorously verified 
(c.f. [1, 4, 5] for what has been proved so far). 

We begin in Section 4.1 with an overview of the various phase transitions 
as they emerge from the statistical mechanics picture. Some bounds on the 
(?-colorability of a random graph are proved in Section 4.2. Finally, Section 
4.3 explores the nature of the coexistence threshold for g-coloring, in partic- 
ular, connecting it with the question of information reconstruction, to which 
Section 5 is devoted. 

4.1. The phase diagram: a broad picture 

Let X = {xi : i G V} denote a (/-coloring of the graph G = {V, E) (i.e. for 
each vertex let Xi G {1, . . . ,q} = Xq). Assuming that the graph G admits 
a proper g-coloring, the uniform measure over the set of proper g-colorings 
of G is 

^JiG{x) = ^ n (4.1) 

with Zg denoting the number of proper g-colorings of G. We shall consider 
the following two examples of a random graph G = Gn over the vertex set 
V = [n]: 

(a) . G = G 

n^a is uniformly chosen from the Erdos-Renyi ensemble G(q, n) 
of graphs of m = [naj edges (hence of average degree 2a). 

(b) . G = Gn^k is a uniformly chosen random /c-regular graph. 

Heuristic statistical mechanics studies suggest a rich phase transition 
structure for the measure ^g(" )■ For any (7 > 4, different regimes are sep- 
arated by three distinct critical values of the average degree: < ad (9) < 
ac{q) < Ois{q) (the case g = 3 is special in that ad{q) = o-ciq), whereas q = 2 
is rather trivial, as 2-colorability is equivalent to having no odd cycles, in 
which case each connected component of G admits two proper colorings, 
independently of the coloring of the rest of G) . In order to characterize such 
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phase transitions we will use two notions (apart from colorability) , namely 
coexistence and sphericity. To define the latter notion we recall that the 
joint type of two color assignments x = {xi : i G V} and y = {yi : i G V} 
is a q X q matrix whose x,y entry (for x,y £ {1, . . . ,q}) is the fraction of 
vertices with color x in the first assignment and color y in the second. 

Definition 4.1. Let v = {^^(x, ?/)}^^j^g[g] he the joint type of two independent 
color assignments, each distributed according to ij,g{-), with v{x,y) = 1/q^ 
denoting the uniform joint type. We say that hg is (e, (5) -spherical if \\u — 
I7||2 < e with probability at least 1 — 5. 

The various regimes of ^g(") are characterized as follows (where all state- 
ments are to hold with respect to the uniform choice of G G G{a, n) with 
probability approaching one as n — > co): 

I. For a < a^iq) the set of proper g-colorings forms a unique compact 
lump: there is no coexistence. Further, i^ci ■ ) is with high probability 
(e, (5)-spherical for any e,5 > 0. 
II. For ad{q) < a < ac{q) the measure fic exhibits coexistence in the 
sense of Section 1.1.2. More precisely, there exist e > 0, C > and for 
each n a partition of the space of configurations into M = Nn sets 
{^l^n} such that for any n and 1 < I < M, 

Furthermore, there exists S = S(q) > 0, called complexity or con- 
figurational entropy and a subfamily Typ = Typ„ of the partition 
{O^^nl^gTyp such that 

for some C" > independent of n and 

SO in particular, |Typ^| = e"^+°("). 
III. For ac{q) < a < a^iq) the situation is analogous to the last one, 
but now Mn is sub-exponential in n. More precisely, for any 5 > 0, 
a fraction 1 — (5 of the measure fiG is comprised of M{6) elements of 
the partition, whereby M{S) converges as n — > oo to a finite random 
variable. Furthermore, /xg'(" ) is no longer spherical. 
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IV. For as{q) < a the random graph G„ is, with high probabiUty, uncol- 
orable (i.e. non g-colorable) . 

Statistical mechanics methods provide semi-expHcit expressions for the 
threshold values ad{q), adq) and as{q) in terms of the solution of a certain 
identity whose argument is a probability measure on the {q — l)-dimensional 
simplex. 



4.2. The COL-UNCOL transition 

Though the existence of a colorable-uncolorable transition is not yet estab- 
lished, g-colorability is a monotone graph property (i.e. if G is g-colorable, 
so is any subgraph of G). As such, Friedgut's theory [2, 3] provides the first 
step in this direction. Namely, 

Theorem 4.2. Suppose the random graph Gn,a is uniformly chosen from 
the Erdds-Renyi graph ensemble G{a,n). Then, for any q > 3 there exists 
as{q;n) such that for any S > 0, 

Jim^P{G„^^^(g.„)(i_5) is q-colorahle] = 1 , (4.2) 
i™o '^^^n,a,{q;n)(i+&) is q-colorablc} = . (4.3) 

We start with a simple upper bound on the COL-UNCOL transition 
threshold. 

Proposition 4.3. The COL-UNCOL threshold is upper bounded as 

Proof. A g-coloring is a partition of the vertex set [n] into q subsets of sizes 
Ux, X £ Xq. Given a g-coloring, the probability that a uniformly chosen edge 
has both end-points of the same color is 

, ( n\ 1 2 



> 



<3 



2 / q n — 1 



Consequently, choosing first the g'-coloring and then choosing uniformly the 
m edges to be included in G = Gn,a we find that the expected number of 
proper g-colorings for our graph ensemble is bounded by 

E{z„)<,»(!i±I-l)"'. 

Vn — 1 qy 
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Since ElZc} for a > as{q) our thesis follows from Markov's inequality. 
□ 

Notice that as{q) = qlogq[l + o(l)] as q ^ oo. This asymptotic behavior 
is known to be tight, for it is shown in [4] that 

Theorem 4.4. The COL-UNCOL threshold is lower hounded as 

as{q; n) > a,{q) = {q - 1) log{q - 1) . (4.5) 

Sketch of proof. Let Z denote the number of balanced (/-colorings, namely 
(/-colorings having exactly n/q vertices of each color. A computation similar 
to the one we used when proving Proposition 4.3 yields the value of EZ. It 
captures enough of KZq to potentially yield a tight lower bound on as{q) by 
the second moment method, namely, using the bound ¥{Zg > 0) > ¥{Z > 
0) > (EZ')^/EZ^. The crux of the matter is of course to control the second 
moment of Z, for which we defer to [4]. □ 

The proof of Theorem 4.4 is non-constructive. In particular, it does not 
suggest a way of efficiently finding a g-coloring when a is near as{q; n) (and 
as of now, it is not even clear if this is possible). In contrast, we provide next 
a simple, 'algorithmic' (though sub-optimal), lower bound on as{q]n). To 
this end, recall that the /c-core of a graph G is the largest induced subgraph 
of G having minimal degree at least k. 

Proposition 4.5. // G does not have a non-empty q-core then it is q- 
colorahle. 

Proof. Given a graph G and a vertex i, denote by the graph obtained 

by removing vertex i and all edges incident to it. If G does not contain a q- 
core, then we can sequentially remove vertices of degree less than q (and the 
edges incident to them), one at a time, until we have decimated the whole 
graph. This simple 'peeling algorithm' provides an ordering i(l), z(2), . . . , 
i{n) of the vertices, such that setting Gq = G and Gt = Gt-i \ {i{t)}, we 
have that for any t < n, the degree of i{t) in G^-i is smaller than q. Our 
thesis follows from the observation that if G \ {i} is (/-colorable, and i has 
degree smaller than q, then G is (/-colorable as well. □ 

As mentioned before, this proof outlines an efficient algorithm for con- 
structing a (/-coloring for any graph G whose g-core is empty, and in prin- 
ciple, also for enumerating in this case the number of g-colorings of G. The 
threshold for the appearance of a g-core in a random Erdos-Renyi graph 
chosen uniformly from G(a,n) was first determined in [80]. 
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Proposition 4.6. Let ha{u) = P{Poisson(2an) > q — 1}, and define (for 

q>V 

acore(g) = sup{a > : ha{u) < u Vn G [0, 1]} . (4.6) 

Then, with high probability, a uniformly random graph G from G(a, n) has 
a q-core if a > acorc(9); CLnd does not have one if a < acordQ)- 

Sketch of proof. Starting the peeling algorithm at such graph Go = Gn,a 
yields an inhomogeneous Markov chain t ^ Gt which is well approximated 
by a chain of reduced state space ifi^ and smooth transition kernel. The 
asymptotic behavior of such chains is in turn governed by the solution of 
a corresponding ODE, out of which we thus deduce the stated asymptotic 
of the probability that a uniformly random graph G from G(a, n) has a q- 
core. We shall not detail this approach here, as we do so in Section 6.4 for 
the closely related problem of finding the threshold for the appearance of a 
2-core in a uniformly random hypergraph. □ 

We note in passing that the value of acoro(9) can be a-priori predicted by 
the following elegant heuristic 'cavity' argument. For a vertex i G 1/ we call 
'q-core induced by the largest induced subgraph having minimum degree 
at least q except possibly at i. We denote by u the probability that for a 
uniformly chosen random edge its end-point i belongs to the g-core 

induced by j. Recall that for large re the degree A of the uniformly chosen 
vertex i of Gn,«i excluding the distinguished edge (i, j), is approximately a 
Poisson(2Q:) random variable. We expect each of these A edges to connect 
i to a vertex from the g-core induced by j with probability u and following 
the Bethe ansatz, these events should be approximately independent of each 
other. Hence, under these assumptions the vertex i is in the q-core induced by 
j with probability ha{u), leading to the self-consistency equation u = ha{u). 
The threshold acore(<?) then corresponds to the appearance of a positive 
solution of this equation. 

4.3. Coexistence and clustering: the physicist's approach 

For a < as{q), the measure fici " ) is well defined but can have a highly non- 
trivial structure, as discussed in Section 4.1. We describe next the physicists 
conjecture for the corresponding threshold ad{q) and the associated com- 
plexity function S(a). For the sake of simplicity, we shall write the explicit 
formulae in case of random (k + l)-regular ensembles instead of the Erdos- 
Renyi ensembles G(a, re) we use in our overview. 
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4.3.1. Clustering and reconstruction thresholds: a conjecture 

Following [66], the conjectured value for a^iq) has a particularly elegant in- 
terpretation in terms of a phase transition for a model on the rooted Galton- 
Watson tree T = T(P, 00) with offspring distribution P = Poisson(2a). With 
an abuse of notation, let /x also denote the free boundary Gibbs measure over 
proper g-colorings of T (recall that every tree is 2-colorable) . More explic- 
itly, a proper g-coloring x = {xi £ : i £ T} is sampled from fi as follows. 
First sample the root color uniformly at random. Then, recursively, for each 
colored node i, sample the colors of its offspring uniformly at random among 
the colors that are different from Xj. 

We denote by the root of T and by B0 (t) the set of vertices of T whose 
distance from the root is at least t. Finally, for any subset of vertices U, we 
let nui ■ ) be the marginal law of the corresponding color assignments. 

For small a the color at the root de-correlates from colors in B^^t) when 
t is large, whereas at large a they remain correlated at any distance t. The 
'reconstruction threshold' separates these two regimes. 

Definition 4.7. The reconstruction threshold a^iq) is the maximal value of 
a such that 

lim E{ 1 g^(^) - X /ig^(^) I Itv } = (4.7) 

(where the expectation is over the random tree T). If the limit on the left- 
hand side is positive, we say that the reconstruction problem is solvable. 

It is conjectured that the coexistence threshold ad{q) for locally tree like 
random graphs coincides with the reconstruction threshold aj^{q) for the 
corresponding random trees. We next present a statistical physics argument 
in favor of this conjecture. There are various non-equivalent versions of this 
argument, all predicting the same location for the threshold. The argument 
that we will reproduce was first developed in [21, 40, 69], to explore the 
physics of glasses and spin glasses. 

Note that the major difficulty in trying to identify the existence of 'lumps' 
is that we do not know, a priori, where these lumps are in the space of 
configurations. However, if X* is a configuration sampled from /x( • ), it will 
fall inside one such lump so the idea is to study how a second configuration 
X behaves when tilted towards the first one. Specifically, fix x* = {x^ S Xg : 
i £ V} and consider the tilted measures 
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where ipi,{x^y) is a tilting function depending continuously on e, such that 
'ipQ{x,y) = 1 (so /ig reduces to the uniform measure over proper colorings), 
and which favors x = y when e > 0. For instance, we might take 

A{x,y) = exp|el(3; = y)^ . 

While the study of the measure /ig ^* ^ is beyond our current means, we 
gain valuable insight from examining its Bethe approximation. Specifically, 
in this setting messages depend in addition to the graph also on x* and e, 
and the Bethe equations of Definition 3.4 are 

ui^jixi) = z:r^jiJe{x*,Xi) Yl {'^ - , (4.8) 

iedi\j 

with Zi^j a normalization constant. In shorthand we write this equation as 

i^i^j = fein^i ■■ I £ di\j} . 

Let us now assume that G is a regular graph of degree k + 1 and that 
X * is a uniformly random proper g-coloring of G. Then, the message i^i^j is 
itself a random variable, taking values in the {q — l)-dimensional probability 
simplex M(Xq). For each a; S we denote by Qx (which also depends on 
e), the conditional law of given that X* = x. In formulae, for any Borel 
measurable subset A of J^{Xq), we have 

Qx{A)=F{ui^j{-)£A\X*=x} . 

Assume that, conditionally on the reference coloring X*, the messages 
h'l^i for I £ di\j are asymptotically independent, and have the laws Qx*- 
We then obtain the following recursion for {Qx}, 

Qx{A) = fi{xi,. .. ,Xk\x) lifei'^i, ... ,1/k) £ A) YlQx.idi^i) , 

Xl...Xk 1 = 1 

where (xi, . . . , Xk) denote the values of (^f , I £ di \ j) and /x(xi, . . . , Xk\x) 
the corresponding conditional marginal of /i = /Xg given X* = x. Assum- 
ing further that for a random regular graph G = Gn,k+i the measure 
fi{xi, . . . ,Xk\x) converges as n — > oo to the analogous conditional law for 
the regular fc-ary tree, we obtain the fixed point equation 

k 

^-(^) = 7r^ E J^^e{l^l,■■■,I^k)£A)f[QxM^^)■ (4.9) 

' xi...xi^j^x i=l 
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In the limit e = this equation admits a trivial degenerate solution, whereby 
Qx = ^ is concentrated on one point, the uniform vector V(x) = 1/q for 
all X £ Xq. The interpretation of this solution is that, as e | 0, a random 
coloring from the tilted measure fJ'Q x* becomes uncorrelated from the 
reference coloring X_* . 

It is not hard to verify that this is the only degenerate solution (namely, 
where each measure Qx is supported on one point), of (4.9) at e = 0. A second 
scenario is however possible. It might be that, as e | (and, in particular, 
for e = 0), Eq. (4.9) admits also a non-trivial solution, whereby at least 
one of the measures Qx is not supported on the uniform vector 17. This is 
interpreted by physicists as implying coexistence: the coloring sampled from 
the tilted measure /Xg x* e I'emains trapped in the same 'state' (i.e. in the 
same subset of configurations i^e^n), as X* . 

Let us summarize the statistical physics conjecture: the uniform measure 
fici') over proper g-colorings of a random (k + l)-regular graph exhibits 
coexistence if and only if Eq. (4.9) admits a non-trivial solution for e = 0. 
In the next subsection we show that this happens if and only if k > kj-{q), 
with kr{q) the reconstructibility threshold on fc-ary trees (which is defined 
analogously to the Poisson tree threshold ar{q), see Definition 4.7). 



4-3.2. The reconstruction threshold for k-ary trees 

We say that a probability measure on M{Xq) is color- symmetric if it is 
invariant under the action of color permutations on its argument u G M{Xq). 
Following [66, Proposition 1] we proceed to show that the existence of certain 
non-trivial solutions {Qx} of (4.9) at e = is equivalent to solvability of the 
corresponding reconstruction problem for A;-ary trees. 

Proposition 4.8. The reconstruction problem is solvable on k-ary trees if 
and only if Eq. (4-9) admits at e = a solution {Qx,x E Xq} such that each 
Qx has the Radon-Nikodym density qi'{x) with respect to the same color- 
symmetric, non- degenerate probability measure Q. 

Proof. First notice that {Qx-,x G Xq} is a solution of (4.9) at e = if and 
only if for any x £ Xq and bounded Borel function 

k 

J 9{iy)q~'Qx{du)=Cq,kJ 5(Fo(i^i,...,i^fc)) f[[Q. - q^'Qx]{dui) , {4.W) 

1=1 

where = Z]x=i Qx and Cq^k = q^~^{q — \)^^ . If this solution is of the 
stated form, then Q^ = Q and upon plugging Qxi^u) = qi>{x)Q{di^) in the 
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identity (4.10) we see that for any bounded Borel function h, 

k 

h{^Q{vi,...,Uk)) n<3(di^i), (4.11) 

1=1 

where z{vi, . . . , z/y^) = Yl'x=i 11^=1 (1 ~ is the normahzation constant 

of the mapping Fo(-) (so Cq^k = ^/ z(jy^ . . . ,V)). Conversely, for any color- 
symmetric probability measure Q on A4{Xq) the value of / u{x)Q[(iv) is 
independent of x G Xq, hence Qxi^^i') = qv{x)Q{dv) are then also probability 
measures on M{Xq) and such that Q = Q^. Further, recall that for any 
X e Xq and z/j G M{Xq), 

k 

z{ui, i/fc)Fo(i/i, . . . , i^k)ix) = Y[{1 - Mx)) , 

i=l 

so if such Q satisfies (4.11), then considering there /i(z^) = g{u)v{x) leads to 
{Qx} satisfying (4.10). 

If a solution Q of (4.11) is degenerate, i.e. supported on one point i/, then 
V = . . . ,u), hence u = V. That is, any non-trivial solution Q 7^ Jp- is also 
non-degenerate. We thus proceed to show that solvability of the reconstruc- 
tion problem on /c-ary trees is equivalent to having color-symmetric solution 
Q ^ 5f of (4.11). To this end, consider a proper g-coloring X = {Xy : u G T} 
of the /c-ary tree, sampled at random according to the free boundary Gibbs 
measure /i. Let i^^*-* denote the marginal distribution of the root color given 
the colors at generation t. In formulae, this is the A^(Afq)-valued random 
variable such that for x G {1, . . . , g}, 

= /^0iB,(i)(^i%.w) = nx, = ^i^B.wi • 

Denote by Qx^ the conditional law of i^^*^ given the root value X0 = x. The 
A;-ary tree of {t + 1) generations is the merging at the root of k disjoint k- 
ary trees, each of which has t generations. Thus, conditioning on the colors 
xi,...,Xk of the root's offspring, one finds that the probability measures 
{Qx^} satisfy for any x and any bounded Borel function h{-) the recursion 

k 

J h{u)Qt'\du) = J MFo(^i,...,^.)) n^^i^Hd^^i), 

starting at Q^x^ = dy.^-, where Vx denotes the probability vector that puts 
weight one on the color x. 
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Let Q(*) denote the unconditional law of z^(*) . That is, Q^*) = J2l=i Qx^ ■ 
By the tower property of the conditional expectation, for any x ^ Xq and 
bounded measurable function h on M{Xq), 

/i(z.)QW(di/) = gE[/i(i/W)I(X0 = x)] 

= gE[/i(z.W)i/W(x)] =g / i/(x)/i(z>)QW (dz.) . 



Consequently, Qx^ has the Radon-Nikodym derivative qv{x) with respect to 
Plugging this into the recursion for gi*) we find that Q^^^ satisfies the 
recursion relation 

(h{v)Q^'+^){du)= [ [£^1111^1 /,(Fo(z.i,...,z.fc)) nQW(dz.O, 

(4.12) 

starting at Q^") = g ^ Y!i=i ^v^- 

Note that for each x € A'^, the sequence is a reversed martingale 

with respect to the filtration T-t = '^(^b (t))' ^ — 0' hence by Levy's down- 
ward theorem, it has an almost sure limit. Consequently, the probability 
measures {Q*-*^} converge weakly to a limit Q^°^\ 

As Q^^^ is color-symmetric and the recursion (4.12) transfers the color- 
symmetry of Q*-*^ to that of Q^^~^'^\ we deduce that Q^^^ is also color- 
symmetric. Further, with the function Fq : M.{Xq)^ — > M.{Xq) continuous 
at any point (z/i, . . . , i^^) for which z{vi, . . . ^UjS) > 0, it follows from the re- 
cursion (4.12) that Q^°°) satisfies (4.11) for any continuous h, hence for any 
bounded Borel function h. By definition, 

I l/^0,B.(t) - X /^B,(i) I Itv = / I k - J^l Itv Q^*^ (dl^) , 

and with Xq finite, the function f ^ \\^ ~ i'IItv is continuous. Hence, the 
reconstruction problem is solvable if and only if Q^"") ^ dy. That is, as 
claimed, solvability implies the existence of a non-trivial color-symmetric 
solution g(°°) of (4.11). 

To prove the converse assume there exists a color-symmetric solution 
Q ^ &T7 oi Eq. (4.11). Recall that in this case Qx(di^) = qv{x)Q{dLiy) are 
probability measures such that Q = q~^Yl'x=iQx- Further, if a random 
variable Y{t) is conditionally independent of X0 given ^5^(4) then 

\\^J'0X{t) - X mywIItv < ll^0,y(j),B0(t) - ^^0^ ^^Y{t)^^^t^\\^^^ 
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(where fJ'0^Y{t) denotes the joint law of and Y{t)). Turning to construct 
such a random variable Y{t) G M.{Xg), let dB0{t) denote the vertices of 
the tree at distance t from and set Vi G M.{Xq) for i G dB0{t)} to be 
conditionally independent given X-g^^^^, with Ui distributed according to the 
random measure Qxii • )• Then, define recursively = foii'm , ■ ■ ■ , i^Uk) 
V G (9B0(s), s = t — 1, t — 2, . . . , 0, where ui, . . . ,Uk denote the offspring of v 
in T. Finally, set Y{t) = V0. 

Under this construction, the law P^^x of conditional upon Xy = j; is 
Qxvi for any v G 960(5), s = . . . ,0. Indeed, clearly this is the case for 
s = t and proceeding recursively, assume it applies at levels t, . . . ,s + 1. 
Then, as {Qx^x G Xq} satisfy (4.10), we see that for v G 060(5) of offspring 
ui, . . . , lifc, any x £ Xq and bounded Borel function g{-), 

k 

^k)) i{Qx^S'^yi)\X, = x] 

i=l 

k 

■,''k)) n[^3-9"'Qx](di^.) 

1=1 



That is, Py^x = Qx, as claimed. In particular, fiY(t)\0 = Qx0, fJ'Yit) = Q ai^d 
with Qxidi') = qi'{x)Q{di/), it follows that 

ll/^0,y(i) - A^0 X /^F(i)l|Tv = -"^WQx - QWtv = / -I7||TvQ(di/) , 

x=l '' 

which is independent of t and strictly positive (since Q 7^ Jp-). By the pre- 
ceding inequality, this is a sufficient condition for reconstructibility. □ 



4-3.3. Complexity: exponential growth of the number of clusters 

We provide next a heuristic derivation of the predicted value of the complex- 
ity parameter S = S(A;) for proper g-colorings of a uniformly chosen random 
regular graph G = Gn,k+i, as defined in Section 4.1, regime II, namely, when 
f^diq) < k < kc{q). This parameter is interpreted as the exponential growth 
rate of the number of 'typical' lumps or 'clusters' to which the uniform mea- 
sure ' ) decomposes. Remarkably, we obtain an expression for S(A;) in 
terms of the non-degenerate solution of (4.9) at e = 0. 
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Recall Definition 3.6 that the Bethe free entropy for proper g-colorings of 
G and a given (permissive) message set {vi^j} is 

<? 

{i,j)eE x=i 
+ Elog{En(l-^.-(^))}- (4-13) 

ieV x=lj£di 

According to the Bethe-Peierls approximation, the logarithm of the num- 
ber Zn of proper g-colorings for G = Gn.k+i is approximated for large n 
by the value of <l>{fj^j} for a message set {fj^j} which solves the Bethe- 
Peierls equations (4.8) at e = 0. One trivial solution of these equations is 
= V (the uniform distribution over {1, . . . , g}), and for G = G^^k+i the 
corresponding Bethe free entropy is 

$(I7) = n{ - ^ log {1 - ^ V{xf} + log { ^(1 - V{x)f+^}] 

x=l x=l 

= n[log(?+^log(l-l/(/)]. (4.14) 

As explained before, when kd{q) < k < ks{q), upon fixing n large enough, 
a regular graph G^ of degree k+1 and a reference proper (/-coloring x* of its 
vertices, we expect Eq. (4.8) to admit a second solution for all e > 

small enough. In the limit e | 0, this solution is conjectured to describe the 
uniform measure over proper g-colorings in the cluster ^ containing x* . 
In other words, the restricted measure 

^J'i,n{^) = fJ'Gn{x\^e,n) = n ^(^» ^ Xj)l{x e ^i^n) , (4.15) 

is conjectured to be Bethe approximated by such message set One 
naturally expects the corresponding free entropy approximation to hold as 
well. That is, to have 

logZ^,„ = <^{uUj} + o{n) . 

As discussed in Section 4.1, in regime II, namely, for k(j^(q) < k < kc{q), it 
is conjectured that for uniformly chosen proper g-coloring X*, the value of 
log Zg^n (for the cluster ^e,n containing X*), concentrates in probability 
as n — > oo, around a non-random value. Recall that log Z„ = $(1^) +o{n), so 
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with most of the Z„ proper g-colorings of G„ comprised within the e"^+°(") 
'typical' clusters O^^n, ^ £ Typ„, each having Z^^n proper (7-colorings, we 
conclude that 

|Typ„| 

= log Z„ + o(n) = log I J2 + o(n) 

= nS + E[${i/*_j.}] + o(n) , (4.16) 

where the latter expectation is with respect to both the random graph Gn 
and the reference configuration X_* (which together determine the message 
set {i^t^j}). 

This argument provides a way to compute the exponential growth rate 
S(/c) of the number of clusters, as 



T,{k) = J^im^n-i{$(l7) - E[${z^*_^j.}]} . 



For a uniformly chosen random proper g-coloring X*, the distribution of 
{j^i^j} can be expressed in the n — > 00 limit in terms of the corresponding 
solution {Qx} of the fixed point equation (4.9) at e = 0. Specifically, follow- 
ing the Bethe ansatz, we expect that for uniformly chosen i £ [n], the law of 
j S di} conditional on {X*,X*,j £ di} converges as n — > 00 to the 
product measure Y\j=iQx* and the law of conditional on X* 

and Xj converges to the product measure Qx* x Qx* ■ By the invariance of 
the uniform measure over proper g-colorings to permutations of the colors, 
for any edge of Gn.k+i, the pair {X*,X*) is uniformly distributed over 
the q{q—l) choices of x-i 7^ Xj in A!^. Moreover, for large n the Bethe approx- 
imation predicts that {Xf , Xj , j £ di} is nearly uniformly distributed over 
the q{q — 1)'''^^ choices of Xj G Xq, all of which are different from Xj G Xq. 
We thus conclude that 

S(fc) = - ^^^ \ JWe{l^l,U2)QxM^l)Qx,{du2) 

+77— We e / w^v(^i,...,^.+i) nQx,(d^.), (4.17) 



where 



VF4.,,..) = log{l^%«^i}. (4.18) 

»-.(.„.,,. = log t n '-T^} ■ c-ii') 

" X = l j = l I " 
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5. Reconstruction and extremality 

As shown in Section 3.4, the Bethe-Peierls approximation apphes for per- 
missive graph-specification pairs (G, ip) such that: 

(a) . The graph G = {y-,E) has large girth (and it often suffices for G to 

merely have a large girth in the neighborhood of most vertices). 

(b) . The dependence between the random vectors and is weak for 

subsets A and B which are far apart on G (indeed, we argued there 
that 'extremality' is the appropriate notion for this property). 

While these conditions suffice for Bethe-Peierls approximation to hold on 
general graphs with bounded degree, one wishes to verify them for specific 
models on sparse random graphs. For condition (a) this can be done by 
standard random graph techniques (c.f. Section 2.1), but checking condition 
(b) is quite an intricate task. Thus, largely based on [43], we explore here 
the extremality condition in the context of random sparse graphs. 

Beyond the relevance of extremality for the Bethe-Peierls approximation, 
it is interesting per se and can be rephrased in terms of the reconstruc- 
tion problem. In Section 4.3.1 we considered the latter in case of proper q- 
colorings, where it amounts to estimating the color of a distinguished (root) 
vertex £ V for a uniformly chosen proper coloring X of the given graph 
G = {V,E), when the colors {Xj,j G [/} on a subset U of vertices are 
revealed. In particular, we want to understand whether revealing the col- 
ors at large distance t from the root, induces a non- negligible bias on the 
distribution of X0. 

It turns out that, for a random Erdos-Renyi graph chosen uniformly from 
the ensemble G{a,n), there exists a critical value ar('7)) such that recon- 
struction is possible (in the sense of Definition 4.7), when the number of 
edges per vertex a > a^iq), and impossible when a < ar{q). Recall from 
Section 4.3, that the reconstruction threshold a^iq) is conjectured to co- 
incide with the so-called 'clustering' threshold a^iq)- That is, the uniform 
measure over proper g-colorings of these random graphs should exhibit co- 
existence if and only if a^iq) = Odiq) < a < adq)- As we will show, this 
relation provides a precise determination of the clustering threshold. 

More generally, consider a graph-specification pair {G,ip), with a distin- 
guished marked vertex £V (which we call hereafter the 'root' of G), and 
a sample X_ from the associated graphical model fiG,ip{3l) of (1.4). The re- 
constructibility question asks whether 'far away' variables X-^ provide 

non- negligible information about (here B0(t) denotes the subset of ver- 
tices i £ V at distance d{0,i) > t from the root). This is quantified by the 
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following definition, where as usual, for {/ C y we denote the corresponding 
marginal distribution of = {Xj : j G [/} by fJ-u{xu)- 

Definition 5.1. The reconstruction problem is (t, e) -solvable (also called, 
(t, e)-reconstructiblej, for the graphical model associated with {G,ijj) and 
rooted at £ V, if 

ll^0,B0(t) ~ X /^Bo(t)llTV > £• (5.1) 

We say that the reconstruction problem is solvable ( reconstructible ), for 
a given sequence {G„} of random graphs (and specified joint distributions 
of the graph Gn, the specification ip on it, and the choice of £ Vn), if 
for some e > and all t > 0, the events An{t) that the reconstruction 
problem is {t, e)-solvable on Gn, occur with positive probability. That is, when 
inftlimsup„^^P{yl„(t)} > 0. 

Remark 5.2. The inequality (5.1) fails when the connected component of 
in G, has diameter less than t. Hence, for sparse random graphs Gn, the 
sequence n ¥{An{t)} is often bounded away from one (on account of 
possibly being in a small connected component). 

The rationale for this definition is that the total variation distance on 
the left hand side of Eq. (5.1) measures the information about X0 that the 
variables in B0(t) provide. For instance, it is proportional to the difference 
between the probability of correctly guessing X0 when knowing X-^^^^y and 
the a-priori probability of doing so without knowing X-^^^^y 

Note that non-reconstructibility is slightly weaker than the extremality 
condition of Section 3.4. Indeed, we require here a decay of the correlations 
between a vertex and an arbitrary subset of vertices at distance t from 
it, whereas in Definition 3.13, we require such decay for arbitrary subsets 
of vertices A and B of distance t apart. However, it is not hard to check 
that when proving Theorem 3.14 we only consider the extremality condition 
in cases where the size of the subset B does not grow with R (or with the 
size of G) and for graph sequences that converge locally to trees, this is in 
turn implied by a non-reconstructibility type condition, where B = {0} is a 
single vertex. 

Recall Section 2.1 that for a uniformly chosen root and locally tree- 
like sparse random graphs Gn, for any t > fixed, the finite neighbor- 
hood B0(t) converges in distribution to a (typically random) tree. We expect 
that with high probability the vertices on the corresponding boundary set 
dB0{t) = {i £ B0(t) : di ^ B0(t)}, are 'far apart' from each other in the 
complementary subgraph B0(t). This suggests that for the graphical model 
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on B0(t), the variables {Xj,j G dB0{t)} are then weakly dependent, and 
so approximating Gn by its limiting tree structure might be a good way to 
resolve the reconstruction problem. In other words, one should expect recon- 
structibility on G„, to be determined by reconstructibility on the associated 
limiting random tree. 

Beware that the preceding argument is circular, for we assumed that 
variables on 'far apart' vertices (with respect to the residual graph 60(4)), 
are weakly dependent, in order to deduce the same for variables on vertices 
that are 'far apart' in G„. Indeed, its conclusion fails for many graphical 
models. For example, [43] shows that the tree and graph reconstruction 
thresholds do not coincide in the simplest example one can think of, namely, 
ferromagnetic Ising models. 

On the positive side, we show in the sequel that the tree and graph recon- 
struction problems are equivalent under the sphericity condition of Defini- 
tion 4.1 (we phrased this definition in terms proper colorings, but it applies 
verbatim to general graphical models). More precisely, if for any e,6 > 0, 
the canonical measure /i( • ) is (e, (5)-spherical with high probability (with 
respect to the graph distribution), then the graph and tree reconstructions 
do coincide. It can indeed be shown that, under the sphericity condition, 
sampling 2L according to the graphical model on the residual graph B0{t), 
results with {Xj,j G dB0{t)} which are approximately independent. 

This sufficient condition was applied in [43] to the Ising spin glass (where 
sphericity can be shown to hold as a consequence of a recent result by 
Guerra and Toninelli [51]). More recently, [74] deals with proper colorings 
of random graphs (building on the work of Achlioptas and Naor, in [4]). 
For a family of graphical models parametrized by their average degree, it is 
natural to expect reconstructibility to hold at large average degrees (as the 
graph is 'more connected'), but not at small average degrees (since the graph 
'falls' apart into disconnected components). We are indeed able to establish 
a threshold behavior (i.e. a critical degree value above which reconstruction 
is solvable) both for spin glasses and for proper colorings. 

5.1. Applications and related work 

Beyond its relation with the Bethe-Peierls approximation, the reconstruction 
problem is connected to a number of other interesting problems, two of which 
we briefly survey next. 

Markov Chain Monte Carlo (MCMC) algorithms provide a well estab- 
lished way of approximating marginals of the distribution /i = hg4' of (1-4). 
The idea is to define an (irreducible and aperiodic) Markov chain whose 
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unique stationary distribution is fJ,{-), so if this chain converges rapidly to 
its stationary state (i.e., its mixing time is small), then it can be effectively 
used to generate a sample 2L from /i(-). 

In many interesting cases, the chain is reversible and consists of local 
updates (i.e. consecutive states differ in only few variables, with transition 
probabilities determined by the restriction of the state to a neighborhood 
in G of the latter set). Under these conditions, the mixing time is known to 
be related to the correlation decay properties of the stationary distribution 
/x(-) (see, [35, 52]). With 

A(t;x) = ||/i0|B,(t)(-kB,(t)) -/^0(-)IItv, (5.2) 

one usually requires in this context that the dependence between X0 and 
—Bfi{t) decays uniformly, i.e. sup^A(t;x) ^ as t ^ cxo. On graphs with 
sub-exponential growth, a fast enough (uniform) decay is necessary and 
sufficient for fast mixing. However, for more general graphs, this uniform 
decay is often a too strong requirement, which one might opt to replace by 
the weaker assumption of non-reconstructibility (indeed, the inequality (5.1) 
can be re-written as E[A(i;X)] > e, where the expectation is with respect 
to the random sample X). 

In this direction, it was shown in [13] that non-reconstructibility is a 
necessary condition for fast mixing. Though the converse may in general 
fail, non-reconstructibility is sufficient for rapid decay of the variance of 
local functions (which in physics is often regarded as the criterion for fast 
dynamics, see [75]). Further, for certain graphical models on trees, [13] shows 
that non-reconstructibility is equivalent to polynomial spectral gap, a result 
that is sharpened in [64] to the equivalence between non-reconstructibility 
and fast mixing (for these models on trees). 

Random constraint satisfaction problems. Given an instance of a 
constraint satisfaction problem (CSP), consider the uniform distribution 
over its solutions. As we have seen in Section 1.2.2, it takes the form (1.23), 
which is an immediate generalization of (1.4). 

Computing the marginal ^i^iyX^) is useful both for finding a solution and 
the number of solutions of such a CSP. Suppose we can generate only one 
uniformly random solution X_. In general this is not enough for approximat- 
ing the law of X0 in a meaningful way, but one can try the following: First, 
fix all variables 'far from 0' to take the same value as in the sampled con- 
figuration, namely Xg^^^^. Then, compute the conditional distribution at 
(which for locally tree-like graphs can be done efficiently via dynamic pro- 
gramming) . While the resulting distribution is in general not a good approx- 
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imation of fJ'0{-), non-reconstructibility implies that it is, with high probabil- 
ity within total variation distance e of /i0(-)- That is, non-reconstructibility 
yields a good approximation of H0{x0) based on a single sample (namely, a 
single uniformly random solution 20- The situation is even simpler under 
the assumptions of our main theorem (Theorem 5.4), where the boundary 
condition 2LQ^(^t) be replaced by an i.i.d. uniform boundary condition. 

We have explained in Section 4 why for a typical sparse random graph of 
large average degree one should expect the set of proper colorings to form 
well-separated 'clusters'. The same rationale should apply, at high constraint 
density, for the solutions of a typical instance of a CSP based on large, sparse 
random graphs (c.f. [6, 67, 71]). This in turn increases the computational 
complexity of sampling even one uniformly random solution. 

Suppose the set of solutions partitions into clusters and any two solutions 
that differ on at most ne vertices, are in the same cluster. Then, knowing 
the value of all 'far away' variables X-^ determines the cluster to which 
the sample J(_ belongs, which in turn provides some information on X0. The 
preceding heuristic argument connects reconstructibility to the appearance 
of well-separated solution clusters, a connection that has been studied for 
example in [58, 65]. 

Reconstruction problems also emerge in a variety of other contexts: (i) 
Phylogeny (where given some evolved genomes, one aims at reconstructing 
the genome of their common ancestor, c.f. [28]); {ii) Network tomography 
(where given end-to-end delays in a computer network, one aims to infer the 
link delays in its interior, c.f. [14]); {in) Gibbs measures theory (c.f. [16, 42]). 
Reconstruction on trees: A brief survey. 

The reconstruction problem is relatively well understood in case the graph 
is a tree (see [76] ) . The fundamental reason for this is that then the canonical 
measure /i(x) admits a simple description. More precisely, to sample 2L from 
//(•), first sample the value of X0 according to the marginal law /i0(x0), 
then recursively for each node j, sample its children {Xi} independently 
conditional on their parent value. 

Because of this Markov structure, one can derive a recursive distributional 
equation for the conditional marginal at the root z/(*)( • ) = /^0|B^(t)( • (j)) 
given the variable values at generation t (just as we have done in the course 
of proving Proposition 4.8). Note that z^^*)( • ) is a random quantity even for 
a deterministic graph Gn (because 2L§^(^f^ is itself drawn randomly from the 
distribution Further, it contains all the information in the bound- 

ary about X0 (i.e. it is a 'sufficient statistic'), so the standard approach to 
tree reconstruction is to study the asymptotic behavior of the distributional 
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recursion for • ). 

Indeed, following this approach, reconstructibility has been thoroughly 
characterized for zero magnetic field Ising models on generic trees (c.f. 
[16, 19, 38]). More precisely, for such model on an infinite tree T of branch- 
ing number br(T), the reconstruction problem is solvable if and only if 
br(T)(tanh/3)2 > 1. For the cases we treat in the sequel, br(T) coincides 
with the mean offspring number of any vertex, hence this result establishes 
a sharp reconstruction threshold in terms of the average degree (or in terms 
of the inverse temperature parameter /?), that we shall generalize here to 
random graphs. 

Reconstruction on general graphs poses new challenges, since it lacks such 
recursive description of sampling from the measure /i( • ). The result of [13] 
allows for deducing non-reconstructibility from fast mixing of certain re- 
versible Markov chains with local updates. However, proving such fast mix- 
ing is far from being an easy task, and in general the converse does not hold 
(i.e. one can have slow mixing and non-reconstructibility). 

A threshold Ar for fast mixing has been established in [77] for the indepen- 
dent set model of (3.20), in case G„ are random bipartite graphs. Arguing as 
in [43], it can be shown that this is also the graph reconstruction threshold. 
An analogous result was proved in [43] for the ferromagnetic Ising model and 
random regular graphs (and it extends also to Poisson random graphs, see 
[31]). In all of these cases, the graph reconstruction threshold does not coin- 
cide with the tree reconstruction threshold, but coincides instead with the 
tree 'uniqueness threshold' (i.e. the critical parameter such that the uniform 
decorrelation condition sup^, A(t;x) — > holds). 

5.2. Reconstruction on graphs: sphericity and tree- solvability 

For the sake of clarity, we focus hereafter on Poisson graphical models. 
Specifying such an ensemble requires an alphabet X, a density parameter 
7 > 0, a finite collection of non-negative, symmetric functionals ipa{' ■, • ) on 
X X X, indexed by a G C, and a probability distribution {p{a) : a £ C} on 
C. In the random multi- graph G„ the multiplicities of edges between pairs 
of vertices i ^ j € [n] are independent Poisson(27/n) random variables, 
and Gn has additional independent Poisson(7/n) self- loops at each vertex 
i £ [n]. For each occurrence of an edge e = {61,62} in G„ (including its 
self-loops), we draw an independent random variable € C according to 
the distribution {p{-)} and consider the graphical model of specification 
= {ipA^{xei,Xe2) '■ 6 € Gn}- Finally, the root is uniformly chosen in [n], 
independently of the graph-specification pair {Gn,ip)- 
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For example, the uniform measure over proper g-colorings fits this frame- 
work (simply take X = Xq and \C\ = 1 with 'ilj{x,y) = I{x ^ y)). 

It is easy to couple the multi-graph Gn of the Poisson model and the 
Erdos-Renyi random graph from the ensemble G(7, n) such that the two 
graphs differ in at most A„ = J2i<i<j<n ^{i,j} edges, where the independent 
variables Y{ij} have the Poisson(7/n) distribution when i = j and that of 
(Poisson(27/n)-l)+ when i ^ j. It is not hard to check that A.„/(log?i) 
is almost surely uniformly bounded, and hence by Proposition 2.6, almost 
surely the Poisson multi- graphs {Gn} are uniformly sparse and converge 
locally to the rooted at 0, Galton- Watson tree T of Poisson(27) offspring 
distribution. Let T(^), £ >0 denote the graph-specification pair on the first 
i generations of T, where each edge carries the specification ipa{' ■, ■ ) with 
probability p(a), independently of all other edges and of the realization of 
T. 

It is then natural to ask whether reconstructibility of the original graphical 
models is related to reconstructibility of the graphical models ^i^^^\x) per 
Eq. (1.4) for G = 'T{€) and the same specification ip. 

Definition 5.3. Consider a sequence of random graphical models {Gn} con- 
verging locally to the random rooted tree T. We say that the reconstruction 
problem is tree-solvable for the sequence {Gn} if it is solvable for {T(^)}. 
That is, there exists e > such that, as ^ —> oo, for any t > 0, 

||T(£) _ T(£) , . 

with positive probability. 

This definition could have been expressed directly in terms of the free 
boundary Gibbs measure fjJ on the infinite rooted tree T. Indeed, the re- 
construction problem is tree-solvable if and only if with positive probability 



l\™j^f|l<B.{t)-^Ix/^l{t)ll-v>0 



While Eqs. (5.3) and (5.1) are similar, as explained before, passing from the 
original graph to the tree is a significant simplification (due to the recursive 
description of sampling from ^'^^'^ {'))■ 

We proceed with a sufficient condition for graph-reconstruction to be 
equivalent to tree reconstruction. To this end, we introduce the concept 
of 'two-replicas type' as follows. Consider a graphical model G and two 
i.i.d. samples XS^\ XS^^ from the corresponding canonical measure /i( • ) = 
fJ-(G,-tp)i') (we will call them replicas following the spin glass terminology). 
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The two replica type is a matrix {i'{x, y) : x,y £ X} where u{x, y) counts 
the fraction of vertices j such that X^p = x and X^^ = y. We denote by IZ 
the set of distributions u on xJY and by TZn the subset of valid two-replicas 
types, that is, distributions v with nv{x,y) G IN for ah x,y £ X. 

The matrix u = i/„, is a random variable, because the graph Gn is random, 
and the two rephcas 2(S^\ Xp''^ are i.i.d. conditional on Gn- If /^(O was 
the uniform distribution, then Vn would concentrate (for large n), around 
V{x,y) = 1/\X\'^. Our sufficient condition requires this to be approximately 
true. 

Theorem 5.4. Consider a sequence of random Poisson graphical mod- 
els {Gn}. Let Vni', •) be the type of two i.i.d. replicas 2L^^\ 2L^'^\ o-iT-d 
Ai/„(x,y) = Un{x,y) —V{x,y). Assume that, for any x £ X, 

lim K\\Aun(x,x) -2\X\-^Y Aiyn(x,x')f\ = 0. (5.4) 

x' 

Then, the reconstruction problem for {Gn} is solvable if and only if it is 
tree- solvable. 

Remark 5.5. The expectation in Eq. (5.4) is with respect to the two replicas 
XW^ Xi^) (which the type z^„( • , • ) is a function of), conditional on Gn, as 
well as with respect to Gn- Explicitly, 

E{F(X(i),X(2))} = e{ J2 f^cAsL^'^hcM^'^) F(x(^),x(2))} . (5.5) 

Remark 5.6. It is easy to see that the sphericity condition of Definition 4-1 
implies Eq. (5.4). That is, (5.4) holds if fJ-Cn ^'^s {e,6n)- spherical for any 
e > and some Sn{£) 0. 

Remark 5.7. In fact, as is hinted by the proof, condition (5.4) can be 
weakened, e.g.u{- • ) can be chosen more generally than the uniform matrix. 
Such a generalization amounts to assuming that 'replica symmetry is not 
broken' (in the spin glass terminology, see [65]). For the sake of simplicity 
we omit such generalizations. 

Condition (5.4) emerges naturally in a variety of contexts, a notable 
one being second moment method applied to random constraint satisfac- 
tion problems. As an example, consider proper colorings of random graphs, 
cf. Section 4. The second moment method was used in [5] to bound from 
below the colorability threshold. The reconstruction threshold on trees was 
estimated in [15, 83]. Building on these results, and as outlined at the end 
of Section 5.3 the following statement is obtained in [74]. 
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Theorem 5.8. For proper q- colorings of a Poisson random graph of density 
7, the reconstruction problem is solvable if and only if ^ > 'JriQ), where for 
large q, 

lr{q) = ^q [log q + log log q + o(l)] . (5.6) 

In general the graph and tree reconstruction thresholds do not coincide. 
For example, as mentioned before, zero magnetic field ferromagnetic Ising 
models on the Galton- Watson tree T(P, p,oo) (of Section 2), are solvable if 
and only if p(tanh(/3))^ > 1. The situation changes dramatically for graphs, 
as shown in [31, 43]. 

Theorem 5.9. For both Poisson random graphs and random regular graphs, 
reconstruction is solvable for zero magnetic field, ferromagnetic Ising models, 
if and only i/ptanh(/?) > 1. 

In physicists' language, the ferromagnetic phase transition occurring at 
ptanh(/?) = 1, cf. Section 2, 'drives' the reconstruction threshold. The proof 
of reconstructibility for ptanh(/3) > 1 essentially amounts to finding a bot- 
tleneck in Glauber dynamics. As a consequence it immediately implies that 
the mixing time is exponential in this regime. We expect this to be a tight 
estimate of the threshold for exponential mixing. 

On the other hand, for a zero magnetic field, Ising spin-glass, the tree and 
graph thresholds do coincide. In fact, for such a model on a Galton- Watson 
tree with Poisson(27) offspring distribution, reconstruction is solvable if and 
only if 27(tanh(/3))2 > 1 (see, [38]). The corresponding graph result is: 

Theorem 5.10. Reconstruction is solvable for Ising spin-glasses of zero 
magnetic field, on Poisson random graph of density parameter 7, provided 
27(tanh(/3))^ > 1, and it is unsolvable i/ 27(tanh(/3))^ < 1. 

5.3. Proof of main results 

Hereafter, let Bi(t)_= {j £ [n] : d{i,j) < t}, B,(i) = {j £ [n] : d{i,j) > t} 
and Dj(t) = Bj(t)nBj(t) (i.e. the set of vertices of distance t from i). Further, 
partition the edges of Gn between the subgraphs Bj(t) and Bj(t) so edges 
between two vertices from Di{t) are all in Bj(t), and excluded from Bj(t). 

Beyond the almost sure convergence of the law of B0(t) to the correspond- 
ing Galton- Watson tree of depth-t, rooted at (which as explained before, 
is a consequence of Proposition 2.6), the proof of Theorem 5.4 relies on the 
following form of independence between B0(t) and B^^t) for Poisson random 
graphs. 
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Proposition 5.11. Let Gn he a Poisson random graph on vertex set [n] and 
density parameter ^ . Then, conditional on B0(t), B^^t) is a Poisson random 
graph on vertex set [n] \ B0(t — 1) with same edge distribution as Gn- 

Proof. Condition on B0(t) = G{t), and let G(t — 1) = B0(t — 1) (notice that 
this is uniquely determined from G(t)). This is equivalent to conditioning on 
a given edge realization between the vertices k, I such that k £ G{t — 1) and 
/ G G{t). 

The graph B0(t) has as vertices the set [77,] \ G(t) and its edges are those 
{k, I) £ Gn such that k, I G(t — 1). Since the latter set of edges is disjoint 
from the one we are conditioning upon, the claim follows by the indepen- 
dence of the choice of edges taken into Gn- D 

We also need to bound the tail of the distribution of the number of vertices 
in the depth-t neighborhood of 0. This can be done by comparison with a 
Galton- Watson process. 

Proposition 5.12. Let ||B0(t)|| denote the number of edges (counting their 
multiplicities), in depth-t neighborhood of the root in a Poisson random graph 
Gn of density 7. Then, for any A > there exists finite gt{X,j) such that, 
for any n, M > 

P{||B0(i)||>M}<5t(A,7)A-^. (5.7) 

Proof- Notice that, because of the symmetry of the graph distribution under 
permutation of the vertices, we can and shall fix to be a deterministic 
vertex. Starting at we explore Gn in breadth-first fashion and consider 
the sequence of random variables Et = ||B0(t)||. Then, for each t > 0, the 
value of Et+i — Et is, conditional on B0(t), upper bounded by the sum of 
|D0(i)| X |B0(t)| i.i.d. Poisson(27/?i) random variables. Since |B0(t)| < n 
and |D0(t)| < Et - Et^i for t > 1 (with |D0(O)| = 1), it follows that Et is 
stochastically dominated by |T(t)|, where T(t) is a depth-t Galton- Watson 
tree with Poisson(27) offspring distribution. By Markov's inequality, 

P{||B0(t)|| > M] < E{aItWI} A"^^ . 

To complete the proof, recall that gt{\,j) = EjA''''^*)!} is the finite solu- 
tion of the recursion (7^+1 (A, 7) = A^(5ft(A, 7), 7) for C(A,7) = e^''''^''*"^^ and 
5o(A,7) = A. □ 

In order to prove Theorem 5.4 we will first establish that, under condition 
(5.4), any (fixed) subset of the variables {Xi, . . . , Xn} is (approximately) 
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uniformly distributed. This is, at first sight, a surprising fact. Indeed, the 
condition (5.4) only provides direct control on two-variables correlations. It 
turns out that two-variables correlations control A;-variable correlations for 
any bounded k because of the symmetry among Xi, . . . , X„. To clarify this 
point, it is convenient to take a more general point of view. 

Definition 5.13. For any distribution ^{ ■ ) over X"^ (where X is a generic 
measure space), and any permutation vr over the set {1 let 
denote the distribution obtained acting with ir on X x • • • x X . 

Let ^{ ■) be a random probability distribution over Xx- ■ -xX. We say that 
^ is stochastically exchangeable if fi is distributed as fj,'^ for any permutation 
vr. 

Proposition 5.14. Suppose (5.4) holds for a finite set X and the type 
Vn of two i.i.d. replicas XS^\ Xp''^ from a sequence of stochastically ex- 
changeable random measures /i^"^ on X"^ . Then, for any fixed set of vertices 
?(1), . . . , i{k) C [n] and any ^i, . . . , G .Y, as n ^ oo, 

]E{l/^!aw(.)(ei,...,e.)-i^rf }-o. (5.8) 

Proof. Per given replicas Xj^\ X^'^\ we define, for any ^ £ X and i € [n], 

Q,(0 = {i(x« = e) - = ^) - 

and let Q(^) = Yll=i Qi(0 denote the average of Qj(^) over a uniformly 
random i G [n]. Since 

Q(e) = Ai/„(e, e) - i^r ' E ^^n{i. - \^\^' E ^^n{x\ 0, 

x' x' 

it follows from (5.4) and the triangle inequality, that E{Q(^)^} — > as 
n ^ oo. Further, |Q(OI < 1) so by the Cauchy-Schwarz inequality we deduce 
that for any fixed, non-empty U [n], b £ U and S,a G 

E{ n Q(ea)}| <IE|Q(6)I ^0. 

Next, fixing i(l),i(2), . . . ,i{k) and U O [k], let 

FC/ ^ E{ n m^{a) = (a) " l^l"') , 
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where E{ • denotes the expectation with respect to the measure of 
the rephcas 2L^^\ 2L^'^\ i-e. at fixed reahzation of /x = /i^"^ Note that by the 
stochastic exchangeabihty of /x, and since sup^ |Q(OI ^ 1) '^^ have that for 
any non-empty U Q [k], 

E{Y^} = ]E{ n Qi(a){U} = E{ n + ^U,n , 

where |A[/_„| is upper bounded by the probabiHty that \U\ < k independent 
uniform in [n] random variables are not distinct, which is 0{l/n). Thus, 
E{y^} — > as n — > oo, for any fixed, non-empty U [k]. 

The proof of the proposition is completed by noting that 5^0 = 1 and 

fJ-iil),..., i{k) iCl, ■■■ ,Ck) = X! \^\^^^~''Yu , 

UC[k] 

hence by the Cauchy-Schwarz inequality, 

iE{l^i(i),...,*w(?i'---'^'^o-i'^'r'f } < E nyuYv\<2' e ^i^u) 

goes to zero as n — > oo. □ 

The following lemma is the key for relating the solvability of the recon- 
struction problem to its tree-solvability. 

Lemma 5.15. For any graphical model fi = fJ-Cn^ij "^^ly vertex G [n], and 
all t<i, 

<5|^|I^«WI||/z>^(,)-/>d^(,)||tv, (5.9) 

where for any U C [n], we let pu{xjj) = \/\X\\^\ denote the uniform dis- 
tribution of xjj, with jifj denoting the marginal law of xjj for the graphical 
model in which the edges of ^^[tj are omitted, whereas denotes such 
marginal law in case all edges of B0{1) are omitted. 

Proof. Adopting hereafter the shorthands B(t), B(t) and D{t) for B0(t), B0{t) 
and D0{t), respectively, recall that by the definition of these sets there are 
no edges in G„ between B(t) and B(t) \ D(t). Hence, A(t,x) of Eqn. (5.2) 
depends only on Xd(j) and consequently, 

ll^0,B{t) ~ M0 X ^B{t)llTv = E A^R r^) {x-^(t) ) A (t , X ) 

X 

= E/^D(t)(^D(t))ll/^0|D(t)(- kD(t)) - M-)\\tv 

X 
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By the same reasoning also 

II/^Jbw - ^0 ^ ^l(t)llTv = Y.^^D(t)i^m)\\^^t\D{t)i■ I^dw) - (• )IItv • 



X 



Since G B(t) C B(£), the conditional law of X0 given x^^^) is the same 
under the graphical model for Gn and the one in which all edges of B(£) 
are omitted. Further, by definition of the total variation distance, the value 
of ||;Uc/ — //^IItv is non-decreasing in [/ C B(£). With the total variation 
distance bounded by one, it thus follows from the preceding identities and 
the triangle inequality that the left hand side of Eq. (5.9) is bounded above 
by 

11/^0 ~ IItv + 2||^D{t) ~ /^D(t)llTV < 3||^b(^) ~ A*b(£)IItv • 

Next, considering the distribution /-fB(^)(-2) on the discrete set Z = X^^^\ 
notice that, as a consequence of Eq. (1.4) and of the fact that B(£) and B(£) 
are edge disjoint, 

for the [0, l]-valued function / = /^g^^^ on Z^ and the distribution p2 = Mb(£) 
on this set. Clearly, replacing p2 in the right hand side of (5.10) by the 
uniform distribution pi = p on Z, results with X^z'e^: /(-^')Pi(-2^') = ^/\^\ 
and in the notations of (3.24), also with pi = f. We thus deduce from the 
latter bound that 

3 

IIMB(^) - /^b(£)IItv < 2I^III'"b(€) ~ PB(i)\\TV , 

and the proof of the lemma is complete upon noting that /^gj-^^ deviates from 
the uniform distribution only in terms of its marginal on D(^). □ 



Proof of Theorem 5.4- Fixing t < i, let A„ denote the left hand side of 
Eq. (5.9). We claim that its expectation with respect to the Poisson random 
model Gn vanishes as n ^ oo. First, with A„ < 1 and sup„P(||B0(£)|| > 
M) ^ as M — > oo, see Proposition 5.12, it suffices to prove that for any 
finite M, as n — > oo, 

E{A„I(||B0(£)|| <M)}^0. 
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Decomposing this expectation according to the finitely many events {B0{i) = 
H}, indexed by rooted, connected, multi-graphs H of less than M edges 
(counting multiplicities), we have by (5.9) that 

E{A„I(||B,(£)||<Af)}<5|^|^^ J2 IE{||/x>^(,)-Pd,wIItv|B,(£) = H}, 

l|H||<M 

and it is enough to show that each of the terms on the right hand side 
vanishes as n ^ oo. 

Recall Proposition 5.11 that each term in the sum is the expectation, 
with respect to a Poisson graphical model of density 7 over the collection 
[n] \ B0(£ — 1) of at least n — M vertices. The event {B0(£) = H} fixes the set 
D = ^0{t) whose finite size depends only on the rooted multi-graph H. By 
Proposition 5.14 we thus deduce that conditional on this event, the expected 
value of 

iia^d - pdIItv = I^dUd) - I'^r'^' , 

vanishes as n ^ co. To recap, we have shown that for any t < i, the expected 
value of the left hand side of Eq. (5.9) vanishes as n — > c«. 

In view of Definition 5.1, this implies that the reconstruction problem is 
solvable for {Gn} if and only if mitlimsupi_^^limsup^^^F{An{t, £, e)} > 
for some e > 0, where An{t,i,e) denotes the event 

Recall that /i^( • ) is the canonical measure for the edge-independent random 
specification on the random graph B0{i) and that almost surely the uni- 
formly sparse Poisson random graphs {Gn} converge locally to the Galton- 
Watson tree T of Poisson(27) offspring distribution. Applying Lemma 2.16 
for the uniformly bounded function I(An{t,i,e)) of B0(£) and averaging 
first under our uniform choice of in [n], we deduce that F{An{t,i,e)} 
F{Aoo{t,i,£)}, where Aoo{t,i,e) denotes the event on the left hand side of 
(5.3). That is, {Gn} is solvable if and only if inft limsup^^o^ F{Aoo{t, i, e)} > 
for some e > 0, which is precisely the definition of tree-solvability. □ 

Proof of Theorem 5.8. Following [74], this proof consists of four steps: 

(1) It is shown in [83] that for regular trees of degree 27 the reconstruction 
threshold 7i.^tree(Q) for proper (7-colorings grows with g ^ 00 as in (5.6). In 
the large 7 limit considered here, a Poisson(27) random variable is tightly 
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concentrated around its mean. Hence, as noted in [83], the result (5.6) ex- 
tends straightforwardly to the case of random Galton- Watson trees with 
offspring distribution Poisson(27). 

(2) Given two balanced proper (/-colorings x^^\ x^'^^ of Gn (a g-coloring is 
balanced if it has exactly n/q vertices of each color), recall that their joint 
type is the g-dimensional matrix •) such that v{x,y) counts the fraction 
of vertices i G [n] with xf^ = x and xf"* = y. Let Z\^{v) denote the number 
of balanced pairs of proper ^-colorings x^^\ x^^-* of Gn with the given joint 
type V. For 7 < q\ogq — 0(1), while proving Theorem 4.4 it is shown in [4] 
that E Zb(z^)/E Zb(F) — > exponentially in n (where v{x,y) = 1/q'^ denotes 
the uniform joint type). 

(3) The preceding result implies that, for any e > and some non-random 
(5„(e) 0, the uniform measure over proper g-colorings of an instance of 
the random Poisson multi-graph G„ is with high probability (e, (5„)-spherical 
(see Definition 4.1). Notice that this implication is not straightforward as it 
requires bounding the expected ratio of Z]~,[v) to the total number of pairs 
of proper g-colorings. We refer to [74] for this part of the argument. 

(4) As mentioned in Remark 5.6, by Theorem 5.4 the latter sphericity condi- 
tion yields that with high probability the g-colorings reconstruction problem 
is solvable if and only if it is tree-solvable. Therefore, the result of step (1) 
about the tree-reconstruction threshold 7r,trec(9) completes the proof. □ 

6. XORSAT and finite-size scaling 

XORSAT is a special constraint satisfaction problem (CSP) first introduced 
in [25]. An instance of XORSAT is defined by a pair = (EI, 6), where EI is 
an m X n binary matrix and 6 is a binary vector of length m. A solution of 
this instance is just a solution of the linear system 

mx = b mod 2. (6.1) 

In this section we shall focus on the /-XORSAT problem that is defined by 
requiring that EI has / non zero entries per row. Throughout this section we 
assume / > 3. 

It is quite natural to associate to an instance J- = (EI,6) the uniform 
measure over its solutions, /x(x). If the positions of the non- vanishing entries 
of the a-th row of EI are denoted by ii{a),. . . ,ii{a), the latter takes the form 

2 m 

= ^ n ^(^hia) e • • • e Xi,(^) = ba) , (6.2) 
^H.fe a=l 
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where © denotes sum modulo 2, and Z^^b is the number of solutions of 
the linear system. A factor graph representation is naturally associated to 
this measure analogously to what we did in Section 1.2 for the satisfiability 
problem. 

In the following we study the set of solutions (equivalently, the uniform 
measure fi{x) over such solutions), of random /-XORSAT instances, dis- 
tributed according to various ensembles. The relevant control parameter is 
the number of equations per variable a = m/n. For the ensembles discussed 
here there exists a critical value as{l) such that, if a < as{l), a random in- 
stance has solutions with high probability. Vice- versa, if a > as{l), a random 
instance typically does not have solutions. 

In the regime in which random instances have solutions, the structure 
of the solution set changes dramatically as a crosses a critical value ad{l)- 
The two regimes are characterized as follows (where all statements should 
be understood as holding with high probability). 

1. a < ad{l)- The set of solutions of the linear system Mx = b forms a 
'well connected lump.' More precisely, there exist c = c(e) < oo such 
that if a set O C {0, l}" contains at least one solution and at most 
half of the solutions, then for all e > and n, 

'^>---- <-) 

II. a^il) < a < as{l). The set of solutions is 'clustered.' There exists 
a partition of the hypercube {0, 1}" into sets ill, ... , Qj\f such that 
> and 

fi{d,ne)=0, (6.4) 

for some e > and all i. Further Af = e"^"''°(") for some S > 0, 
and each subset ^le contains the same number of solutions \{x £ : 
Mx = h}\ = e"*+°("), for some s > 0. Finally, the uniform measure 
over solutions in 0^, namely ni{ • ) = /i( • satisfies the condition 
(6.3). 

The fact that the set of solutions of XORSAT forms an affine space makes 
it a much simpler problem, both computationally and in terms of analysis. 
Nevertheless, XORSAT shares many common features with other CSPs. For 
example, regimes I and II are analogous to the first two regimes introduced in 
Section 4.1 for the coloring problem. In particular, the measure ^( • ) exhibits 
coexistence in the second regime, but not in the first. This phenomena is 
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seen in many random CSPs ensembles following the framework of Section 
1.2.2, well beyond the cases of coloring and XORSAT (see [65] for further 
details and references). Some rigorous results in this direction are derived 
in [6, 68], but the precise parameter range for the various regimes remains 
a conjecture, yet to be proved. Even for XORSAT, where the critical values 
Q^d(O) Q^s(0 have been determined rigorously (see [23, 72]), the picture we 
described is not yet completely proved. Indeed, as of now, we neither have 
a proof of absence of coexistence for a < a^H), i.e. the estimate Eq. (6.3), 
nor a proof of its analog for the uniform measure when a > ad(0- 

In Section 6.1 we focus on the case of random regular graphs, moving 
in Section 6.2 to uniformly random ensembles and their 2-cores, for which 
Sections 6.3 to 6.6 explore the dynamical (or 'clustering') phase transition 
and derive its precise behavior at moderate values of n. 

6.1. XORSAT on random regular graphs 

We begin with some general facts about the XORSAT problem. Given a 
XORSAT instance J- = (]HI,6), we denote by r{M) the rank of EI over the 
finite field GF(2), and by > 1 the number of solutions x of Hrc = 
mod 2. From linear algebra we know that J- is satisfiable (i.e. there exists a 
solution for the system Hx = b mod 2) if and only if h is in the image of 
H. This occurs for precisely 2''('^) of the 2"^ possible binary vectors h and in 
particular it occurs for 6 = 0. If is satisfiable then the set of all solutions is 
an affine space of dimension n — r(]H) over GF(2), hence of size Ze = 2'^~'^^^\ 
Further, r(]HI) < m with r(IHI) = m if and only if the rows of H are linearly 
independent (over GF(2)), or equivalently iff Zi^r = 1 (where denotes 
the transpose of the matrix H) . 

Let us now consider a generic distribution over instances J- = (H, 6) such 
that h is chosen uniformly from {0, 1}"^ independently of the matrix H. It 
follows from the preceding that 

¥{T is satisfiable) = 2"-™E[1/Zh] > ^{Z^t = 1) . (6.5) 

By these considerations also 

P(J^ is satisfiable) < ^ + ]f{ZmT = 1) • 

As mentioned already, the probability that a random instance is satisfiable, 
is satisfiable), abruptly drops from near one to near zero when the 
number of equations per variable crosses a threshold as(0- ^ consequence 
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of our bounds, ¥{Z^t = 1) abruptly drops from near one to near zero at the 
same threshold. 

Suppose that H is the parity matrix of a factor graph G = {V, F, E) which 
may have multiple edges. That is, each entry Hai of the binary matrix H is 
just the parity of the multiplicity of edge (a, i) in G. We associate to the 
instance J- an energy function Sm,b{x) given by the number of unsatisfied 
equations in the linear system M.x = b, with the corresponding partition 
function 

Zm,bif3) = J2 exp{-2/3fe,fe(x)} . (6.6) 

X 

In particular, = lim^_>oo Z^^i,(f3) whenever the instance is satisfiable. 
Moreover, it is easy to show that Zj^^i,((3) is independent of 6 whenever the 
instance is satisfiable. 

We thus proceed to apply the general approach of high-temperature ex- 
pansion on Ze,o(/9)i which here yields important implications for all /3 > 
and in particular, also on Z^. For doing so, it is convenient to map the vari- 
able domain {0, 1} to {+1, —1} and rewrite Z]hi,o(/3) as the partition function 
of a generalized (ferromagnetic) Ising model of the form (2.12). That is, 

ZmM = e-^""' E exp {f5 ^ Xa} ^ e-^^^^Zdf^) , (6.7) 

where Xa = OiG^a Xi for each a £ F. We also introduce the notion of a 
hyper-loop in a factor graph G = {V, F, E), which is a subset F' of function 
nodes such that every variable node i £ V has an even degree in the induced 
subgraph G' = {V,F',E'). 

Lemma 6.1. Set Ng{0) = 1 and Ncii) denote the number of hyper-loops 
of size i > 1 in a factor graph G = {V, F, E). Then, for any /3 G M,, 

Zg{^) = 2l^l (cosh /3)l^l E Ng{() (tanh /3)^ . (6.8) 

1=0 

Further, if Z^t = 1 with EI the parity matrix of G then 

Zg(/3) = 2l^l(cosh/3)l^l. 

Proof. Observe that e^^" = cosh(/3)[l + Xa(tanh/3)] for any function node 
a £ F and any x G {+1, —1}^- Thus, setting F = [m], we have the following 
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'high-temperature expansion' of ZG{f3) as a polynomial in (tanh/3), 
Zg(/3) = (cosh/3r5] n[l + ^'^(tanh/?)] 

= (cosh/3r E (tanh/3)l^'lE n ^- 

F'C[m,] i aeF' 

Since xf = I for each i S F we see that OaGF' is simply the product of Xi 
over all variable nodes i that have an odd degree in the induced subgraph G' . 
The sum of this quantity over all x G {+1, —1}^ is by symmetry zero, unless 
F' is either an empty set or a hyper-loop of G, in which case this sum is just 
the number of such binary vectors x, that is 2l^L Therefore, upon collecting 
together all hyper-loops F' in G of the same size we get the stated formula 
(6.8). To complete the proof of the lemma note that the sum of columns of 
the transpose H-^ of the parity matrix of G corresponding to the function 
nodes in a hyper-loop F' in G must be the zero vector over the field GF(2). 
Hence, the existence of a hyper-loop in G provides a non-zero solution of 
^'^y = mod 2 (in addition to the trivial zero solution). Consequently, if 
Z^T = 1 then Ncii) = for all ^ > 1, yielding the stated explicit expression 
for ZgW). □ 

We now consider the /c-XORSAT for ensembles Qi^kin, m) of random (/, k)- 
regular graphs drawn from the corresponding configuration model. Such an 
ensemble is defined whenever nl = mk as follows. Attach / half-edges to each 
variable node i £ V, and k half-edges to each function node a £ F. Draw a 
uniformly random permutation over nl elements, and connect edges on the 
two sides accordingly. We then have the following result about uniqueness 
of solutions of random regular linear systems. 

Theorem 6.2. Let M denote the m x n parity matrix of a random {l,k)- 
regular factor graph from Qi^k{n-, m), with I > k > 2. Then, the linear system 
M.X = mod 2 has, with high probability as n oo, the unique solution 
x = 0. 

Proof. Let Zj^{w) denote the number of solutions of Mx = with w non-zero 
entries. Such solution corresponds to a coloring, by say red, w vertices of the 
multi-graph G, and by, say blue, the remaining n — w vertices, while having 
an even number of red half-edges at each function node. A convenient way 
to compute E Z^[w) is thus to divide the number of possible graph colorings 
with this property by the total size {nl)\ of the ensemble Qi^k{n,m). Indeed, 
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per integers {mr,r = 0, . . . ,k) that sum to m there are 




ways of coloring the mk half-edges of function nodes such that nir nodes 
have r red half-edges, for r = 0, . . . ,k. There are (^) ways of selecting red 
vertices and {wl)\(nl — wl)\ color consistent ways of matching half-edges of 
factor nodes with those of vertices, provided one has the same number of red 
half-edges in both collections, that is J2r '^"^r = The value of KZ^{w) 
is thus obtained by putting all this together, and summing over all choices 
of (mo, • • • , rrik) such that = for odd values of r. Setting w = nuj and 
using Stirling's formula one finds that EZe(u)) = exp(n(^(u;) +o{n)) for any 
fixed uj G (0, 1), with an explicit expression for </)(•) (c.f. [65, Section 11.2.1] 
and the references therein). For I > k > 2 the only local maximum of (f){u) 
is at a; = 1/2, with (j){0) = 0(1) = and (j){l/2) < 0. Hence, in this case 
4>{uj) < for all UJ € (0, 1). Further, from the formula for MZji{w) one can 
show that for k > small enough the sum of EZji[w) over 1 < w < ku and 
n — w < Kn decays to zero with n. Therefore, 

n 

lim y EZm(w) = 0, 

w=l 

which clearly implies our thesis. □ 

We have the following consequence about satisfiability of XORSAT for 
random (/, A;)-regular factor graphs. 

Corollary 6.3. Choosing a random {I, k) -regular factor graph G from the 
ensemble Qi k{n,nn), with k > I > 2. Then, the probability that ZciP) = 
2" (cosh /?)'"/'= goes to one as n —> oo and so does the probability that k- 
XORSAT with the corresponding parity matrix H has 2*" solutions for s = 
l-l/k. 

Proof. Let H be the parity matrix of a randomly chosen (/, A;)-regular factor 
graph G from ensemble Gi,k{n', m). Then, has the law of the parity matrix 
for a random {k, /)-regular factor graph from ensemble Qk,i{fn, n). Thus, from 
Theorem 6.2 we know that P(Zjjt = 1) — > 1 as n — > oo and by Lemma 
6.1 with same probabilities also Zg{P) = 2'^(cosh /3)'"/'^ (as here \V\ = n 
and |F| = nl/k). We complete the proof upon recalling that if Z^t = 1 
then r(EI) = m = \F\ and there are 2"'"™' solutions x of the corresponding 
XORSAT problem (for any choice of 6) . □ 
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See [65, Chapter 18] for more information on XORSAT models, focusing 
on the zero-temperature case. 

6.2. Hyper-loops, hypergraphs, cores and a peeling algorithm 

As we saw in the previous section, if the bipartite graph G associated to 
the matrix M does not contain hyper-loops, then the linear system Mx = b 
is solvable for any h. This is what happens for a < as{d): a random matrix 
M with I non-zero elements per row is, with high probability, free from 
hyper-loops. Vice versa, when a > as{d) the matrix H contains, with high 
probability, @{n) hyper-loops. Consequently, the linear system Mx = 6 is 
solvable only for 2™~®(") of the vectors b. If b is chosen uniformly at random, 
this implies that ¥{J^ is satisfiable} ^ as n — > c«. 

Remarkably, the clustering threshold a^il) coincides with the threshold 
for appearance of a specific subgraph of the bipartite graph G, called the core 
of G. The definition of the core is more conveniently given in the language 
of hypergraphs. This is an equivalent description of factor graphs, where 
the hypergraph corresponding to G = {V, F, E) is formed by associating 
with each factor node a £ F the hyper-edge (i.e. a subset of vertices in V), 
da consisting of all vertices i £ V such that {i,a) G E. The same applies 
for factor multi-graphs, in which case a vertex i £ V may appear with 
multiplicity larger than one in some hyper-edges. 

Definition 6.4. The r-core of hyper-graph G is the unique subgraph ob- 
tained by recursively removing all vertices of degree less than r ( when count- 
ing multiplicities of vertices in hyper-edges). In particular, the 2-core, here- 
after called the core of G, is the maximal collection of hyper-edges having 
no vertex appearing in only one of them ( and we use the same term for the 
induced subgraph). 

Obviously, if G contains a non-empty hyper-loop, it also contains a non- 
empty core. It turns out that the probability that a random hypergraph 
contains a non-empty core grows sharply from near zero to near one as 
the number of hyper-edges crosses a threshold which coincides with the 
clustering threshold Qd(0 of XORSAT. 

Beyond XORSAT, the core of a hyper-graph plays an important role in 
the analysis of many combinatorial problems. 

For example, Karp and Sipser [55] consider the problem of finding the 
largest possible matching (i.e. vertex disjoint set of edges) in a graph G. 
They propose a simple peeling algorithm that recursively selects an edge 
e = {i,j) G G for which the vertex i has degree one, as long as such an 
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edge exists, and upon including e in the matching, the algorithm removes 
it from G together with all edges incident on j (that can no longer belong 
to the matching). Whenever the algorithm successfully matches all vertices, 
the resulting matching can be shown to have maximal size. Note that this 
happens if an only if the core of the hyper-graph G is empty, where G has a 
c-node e per edge e of G and a v-node i per vertex i of degree two or more in 
G that is incident on e in G if and only if e is incident on i in G. Consequently, 
the performance of the Karp-Sipser algorithm for a randomly selected graph 
has to do with the probability of non-empty core in the corresponding graph 
ensemble. For example, [55] analyze the asymptotics of this probability for 
a uniformly chosen random graph of vertices and M = [Nc/2\ edges, as 
— > oo (c.f. [12, 34] for recent contributions). 

A second example deals with the decoding of a noisy message when com- 
municating over the binary erasure channel with a low-density parity-check 
code ensemble. This amounts to finding the unique solution of a linear sys- 
tem over GF(2) (the solution exists by construction, but is not necessarily 
unique, in which case decoding fails). If the linear system includes an equa- 
tion with only one variable, we thus determine the value of this variable, 
and substitute it throughout the system. Repeated recursively, this proce- 
dure either determines all the variables, thus yielding the unique solution 
of the system, or halts on a linear sub-system each of whose equations in- 
volves at least two variables. While such an algorithm is not optimal (when 
it halts, the resulting linear sub-system might still have a unique solution), 
it is the simplest instance of the widely used belief propagation decoding 
strategy, that has proved extremely successful. For example, on properly 
optimized code ensembles, this algorithm has been shown to achieve the 
theoretical limits for reliable communication, i.e., Shannon's channel capac- 
ity (see [62]). Here a hyper-edge of the hyper-graph G is associated to each 
variable, and a vertex is associated to each equation, or parity check, and 
the preceding decoding scheme successfully finds the unique solution if and 
only if the core of G is empty. 

In coding theory one refers to each variable as a v-node of the correspond- 
ing bipartite factor graph representation of G and to each parity-check as 
a c-node of this factor graph. This coding theory setting is dual to the one 
considered in XORSAT. Indeed, as we have already seen, satisfiability of 
an instance (H, b) for most choices of 6 is equivalent to the uniqueness of 
the solution of M^x = 0. Hereafter we adopt this 'dual' but equivalent cod- 
ing theory language, considering a hyper-graph G chosen uniformly from 
an ensemble Qi{n,m) with n hyper-edges (or v-nodes), each of whom is a 
collection of / > 3 vertices (or c- nodes), from the vertex set [m]. Note that 
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as we passed to the dual formulation, we also inverted the roles of n and m. 
More precisely, each hyper-graph in ^ = Qi{n, m) is described by an ordered 
list of edges, i.e. couples (i, a) with i G [n] and a G [m] 

E = [(l,ai), (1,02),. . . , (2,az+i), ■ ■ ■ ; (n, a(„_i)/+i), . . . , (n,a„z)] , 

where a couple (i, a) appears before (j, b) whenever i < j and each v-node i 
appears exactly I times in the list, with Z > 3 a fixed integer parameter. In 
this configuration model the degree of a v-node i (or c-node a) , refers to the 
number of edges («,&) (respectively {j,CL)) in E to which it belongs (which 
corresponds to counting hyper-edges and vertices with their multiplicity). 

To sample G from the uniform distribution over Q consider the v-nodes in 
order, i = 1, . . . ,n, choosing for each v-node and j = 1, . . . ,1, independently 
and uniformly at random a c-node a = a(j_i);_|_j G [m] and adding the 
couple (i, o) to the list E. Alternatively, to sample from this distribution 
first attribute sockets {i — 1)1 + 1, ... ,il to the i-th v-node, i = 1, . . . , n, then 
attribute ka sockets to each c-node a, where kaS are mutually independent 
Poisson(^) random variables, conditioned upon their sum being nl (these 
sockets are ordered using any pre-established convention). Finally, connect 
the v-node sockets to the c-node sockets according to a permutation a of 
{1, . . . ,nl} that is chosen uniformly at random and independently of the 
choice of k^s. 

In the sequel we take m = [np\ for p = l/j > bounded away from 
and 00 and study the large n asymptotics of the probability 

Pi{n,p) = F{G G Qi{n,m) has a non-empty core} (6-9) 

that a hyper-graph G of this distribution has a non-empty core. Setting H"^ 
as the parity matrix of a uniformly chosen G from Qi(n,m) corresponds to 
a binary matrix EI chosen uniformly (according to a configuration model), 
among all n x m matrices with I non-zero entries per row. That is, the 
parameter p corresponds to 1/a in the /-XORSAT. 

6.3. The approximation by a smooth Markov kernel 

Our approach to Pi{n, p) is by analyzing whether the process of sequentially 
peeling, or decimating, c-nodes of degree one, corresponding to the decod- 
ing scheme mentioned before, ends with an empty graph, or not. That is, 
consider the inhomogeneous Markov chain of graphs {G{t), t > 0}, where 
G(0) is a uniformly random element of Qi{n, m) and for each r = 0, 1, . . . , if 
there is a non-empty set of c-nodes of degree 1, choose one of them (let's say 
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a) uniformly at random, deleting the corresponding edge (i, a) together with 
all the edges incident to the v-node i. The graph thus obtained is G{t + 1). 
In the opposite case, where there are no c-nodes of degree 1 in G{t), we set 



Reducing the state space to Z^. We define furthermore the process 
{z{t) = {zi{t), Z2{t)), r > 0} on Z^, where -2i(r) and Z2{t) are, respec- 
tively, the number of c-nodes in G{t), having degree one or larger than one. 
Necessarily, {n — t)1 > zi{t) + 2^2 (r), with equality if Z2{t) = 0, where 
r = min(r, infjr' > : ^i(t') = 0}), i.e. r = r till the first r' such that 
zi{t') = 0, after which f is frozen (as the algorithm stops). 

Fixing / > 3, 771 and n, set z = (zi, Z2) S T?y_ and G{z, r) denote the ensem- 
ble of possible bipartite graphs with zi c-nodes of degree one and Z2 c-nodes 
of degree at least two, after exactly r removal steps of this process. Then, 
Q{z, t) is non-empty only if zi + 2z2 < {n — T)l with equality whenever Z2 = 0. 
Indeed, each element of Q{z,t) is a bipartite graph G = {U,V; R, S,T; E) 
where U, V are disjoint subsets of [n] with U UV = [n] and R,S,T are dis- 
joint subsets of [m] with RU SUT = [m], having the cardinalities \U\ = t, 
\V\ = n — T, \R\ = m — zi — Z2, \S\ = zi, \T\ = Z2 and the ordered list E 
of {n — t)1 edges («, a) with i a v-node and a a c-node such that each i £ V 
appears as the first coordinate of exactly / edges in E, while each j (z U 
does not appear in any of the couples in E. Similarly, each c € R does not 
appear in E, each b G S appears as the second coordinate of exactly one 
edge in E, and each a £ T appears in some ka > 2 such edges. 

The following observation allows us to focus on the much simpler process 
z{t) on instead of the graph process G(r) G Q{z,t). 

Lemma 6.5. Conditional on {z{t'),0 < t' < r}, the graph G{t) is uni- 
formly distributed over Q{z,f). Consequently, the process {z{t)t > 0} is an 
inhomogeneous Markov process. 

Proof outline: Fixing t, z = z{t) such that zi > 0, z' = z{t + 1) and 
G' G G{z',T + 1), let N{G'\z,t) count the pairs of graphs G £ Q{z,t) and 
choices of the deleted c-node from S that result with G' upon applying a 
single step of our algorithm. Obviously, G and G' must be such that R C R' , 
S C R'US' and T' C T. With qo = \R'nS\,po = \R'nT\, qi = \S' nT\ and 
q2 denoting the number of c-nodes a £ T' for which ka > k'^, it is shown in 
[32, proof of Lemma 3.1] that {po, qo, qi, 52) belongs to the subset V of 
where both the relations 



G(r+1) = G(t). 




zo = z'o-qo-Po, 
zi = z[ + qo-qi, 
Z2 = z'2+po + qi, 



(6.10) 
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for zo = m—zi—Z2, z'q = m— 2^—2:2, and the inequalities {n—T)l—[zi+2z2) > 
I - (2po + Qq + (li) > (I2, qo + Pq < z'q, qi < z[ (equivalently, go < ^1), 92 < 4 
(equivalently, po + 91 + 92 < -^2) hold. In particular |D| < + It is further 
shown there that 



N{G'%r) = {r+l)l\Y.r ^ ^ '^('^('^^(90,^0,91,92), (6.11) 
^ \ 90, Po, • / \9i/ \92/ 

depends on G' only via where 

q(90,P0,9i,92) = 9ocoeff[(e- - 1 - ^Y^\e^ - l)'^l+'?^ x'"'^"] . 

We start at r = with a uniform distribution of G(0) within each possible 
ensemble ^(i*(0),0). As N{G'\ijo,t) depends on G' only via Co' it follows 
by induction on r = 1,2,... that conditional on {z(t'),0 < r' < r}, the 
graph G(r) is uniformly distributed over Q{z, r) as long as r = r. Indeed, if 
zi(t) > 0, then with h{z,T) denoting the number of graphs in Q{z,t), 

F{G{T + l) = G'\{ziT'),0<T' <t}} - ^ ^(^'l^»'^) 



Zi h{z{T),T) 



is the same for all G' £ G{z' , r + 1). Moreover, noting that G(r) = G{t) and 
z{t) = z{t) we deduce that this property extends to the case of r < r (i.e. 
zi{t) = 0). Finally, since there are exactly h{z' , r+1) graphs in the ensemble 
G{^,T + 1) the preceding implies that {z{t), r > 0} is an inhomogeneous 
Markov process whose transition probabilities 

VF+(Az|z) = F{z{t + 1) = z + Az\ z{t) = z} , 

for Az = (Azi,A2;2) and z[ = zi + Azi, z'2 = Z2 + Az2 are such that 
VF+(Ai'|z) = I(Az = 0) in case zi = 0, whereas W:^{Az\z) = h{z',T + 
l)N{G'\z,T)/{zih{z,T)) when zi > 0. □ 

To sample from the uniform distribution on G (z, r) first partition [n] 
into U and V uniformly at random under the constraints |[/| = r and |y| = 
(n — r) (there are (") ways of doing this), and independently partition [m] to 
RUSUT uniformly at random under the constraints \R\ = m — zi — Z2, \S\ = 
z\ and \T\ = Z2 (of which there are {^_^^ .) possibilities). Then, attribute / 
v-sockets to each i £ V and number them from 1 to (n — t)1 according to 
some pre-established convention. Attribute one c-socket to each a £ S and 
ka c-sockets to each a £ T, where ka are mutually independent Poisson(C) 
random variables conditioned upon ka > 2, and further conditioned upon 
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J2aeT being (n — t)1 — z\. Finally, connect the v-sockets and c-sockets 
according to a uniformly random permutation on (n — r)/ objects, chosen 
independently of the k^?,. Consequently, 




h{z,T)=\ 1 ( coeff[(e'' - 1 -x)"2^x('^-")'-"i]((n-T)0! (6.12) 



r , 



Approximation by a smooth Markov transition kernel. Though the 
transition kernel W^(-|z) of the process 2(-) is given explicitly via (6.11) 
and (6.12), it is hard to get any insight from these formulas, or to use them 
directly for finding the probability of this process hitting the line z\{t^ = 
at some t < n (i.e. of the graph G{0) having a non-empty core). Instead, we 
analyze the simpler transition probability kernel 

We{Az\x) ^( \ )pg«-^pf pr , (6.13) 

with qo = —Azi — Az2 > 1, Qi = — Az2 > and q2 = I + Azi + 2Az2 > 0, 
where 

P°=/(r^' P^ = Kl-^)(l-e-^-Ae-^)' P2 = l-Po-Pi, (6.14) 

for each 9 G [0, 1) and x G IR,^ such that xi + 2x2 ^ ^(1 — ^). In case X2 > 
we set A = A(x, 6) as the unique positive solution of 

A(l-e-^) _l{l-e)-x, ^g^^^^ 



1 — — \e~^) X2 

while for X2 = we set by continuity pi = (corresponding to A — > co). 

Intuitively, (po , p i , p2 ) are the probabilities that each of the remaining / — 1 
edges emanating from the v-node to be deleted at the t = nO step of the 
algorithm is connected to a c-node of degree 1, 2 and at least 3, respectively. 
Indeed, of the nl{l — 9) v-sockets at that time, precisely zi = nxi are 
connected to c-nodes of degree one, hence the formula for pQ. Our formula 
for pi corresponds to postulating that the Z2 = nx2 c-nodes of degree at least 
two in the collection T follow a Poisson(A) degree distribution, conditioned 
on having degree at least two, setting A > to match the expected number 
of c-sockets per c-node in T which is given by the right side of (6.15). To 
justify this assumption, note that 

t 

coeff [(e'' - 1 - x)*, x"]A"(e^ - 1 - A)"* = P(^ Ni = s) , (6.16) 
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for i.i.d. random variables Ni, each having the law of a Poisson(A) random 
variable conditioned to be at least two. We thus get from (6.11) and (6.12), 
upon applying the local CLT for such partial sums, that the tight approxi- 
mation 



^ C{l,e) 



n 



applies for (z, r) G Q+(e), Azi G {-/, 2}, Az2 G {-{1 - 1), . . . ,0}, 
with 

Q+(e) = {(z, r) : 1 < zi ; ne < ; < r < n(l - e) ; 

ne < {n — t)1 — zi — 2^2} , 

approaching (as e | 0) the set Q+(0) C in which the trajectory {z{t),t) 
evolves till hitting one of its absorbing states {(z, r) : zi(r) = 0, r < n} (c.f. 
[32, Lemma 4.5] for the proof, where the restriction to Q+(e) guarantees 
that the relevant values of t in (6.16) are of order n). 

The initial distribution. Considering m = [np\ , for p = l/^ £ [e,l/e] and 
large n, recall that 



¥^[S^^^ =nl] 



where Sm = YT=i^i for = (%=i, %>2, A^i) G ^+ and Ni that are 
i.i.d. Poisson(7) random variables (so ES^' = nl up to the quantization 
error of at most 7). Hence, using sharp local CLT estimates for Sm we find 
that the law of z(0) is well approximated by the multivariate Gaussian law 
G2('|f^y(0); nQ(0)) whose mean ny{0) = ny{6;p) consists of the first two 
coordinates of npMXi, that is, 

m P) = P{l(r\ 1 - e-^ - 7e-^) , (6.17) 

and its positive definite covariance matrix nQ(0; p) equals np times the con- 
ditional covariance of the first two coordinates of X\ given its third coordi- 
nates. That is, 

Qii(O) = ^7e-"^(e^-l + 7-7'), 

Qi2(0) = -^7e-'ne^-l-7'), (6.18) 
Q22(0) = ^e-27[(e7-l)+7(e^-2)-72(l + 7)]- 
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More precisely, as shown for example in [32, Lemma 4.4], for all n, r and 
[e,l/e], 

sup sup |P{n ■z<x}- G2{u ■ z < x\ny{Oy,nQ{0))\ < K(e)n~^/2 . (6.19) 

Absence of small cores. A considerable simplification comes from the 
observation that a typical large random hyper-graph does not have a non- 
empty core of size below a certain threshold. Indeed, a subset of v-nodes of 
a hyper-graph is called a stopping set if the restriction of the hyper-graph to 
this subset has no c-node of degree one. With N{s,r) counting the number 
of stopping sets in our random hyper-graph which involve exactly s v-nodes 
and r c-nodes, observe that necessarily r < [ls/2\. Further, adapting a result 
of [79] (and its proof) to our graph ensemble, it is shown in [32, Lemma 4.7] 
that for ^ > 3 and any e > there exist k = K.{l,e) > and C = C{l,e) 
finite, such that for any m > en 

E ^ N{s,r)\ < Cm^-^/^. 

s=l r=l 

Since the core is the stopping set including the maximal number of v-nodes, 
this implies that a random hyper-graph from the ensemble Gi{n,m) has a 
non-empty core of less than uik v-nodes with probability that is at most 
(alternatively, the probability of having a non-empty core with less 
than UK v-nodes is at most C n^~'/^). 



6.4- The ODE method and the critical value 

In view of the approximations of Section 6.3 the asymptotics of Pi{n, p) 
reduces to determining the probability Fn,p{zi{T) = for some r < n) 
that the inhomogeneous Markov chain on with the transition kernel 
WT-/n{^z\n~^z) of (6.13) and the initial distribution G2(-|ra^/(0); nQ(0)), hits 
the line zi{t) = for some t < n. 

The functions {x,9) pa{x,6), a = 0,1,2 are of Lipschitz continuous 
partial derivatives on each of the compact subsets 

q+{e) = {{x, 9) : < xi ; < X2 ; 6" G [0, 1 - e] ; < (1 - 6*)/ - xi - 2x2} , 

of X M,^ where the rescaled (macroscopic) state and time variables x = 
n~^z and 9 = r/n are whenever {z,t) G Q+(e)- As a result, the transition 
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kernels of (6.13) can be extended to any x G such that for some L = 
L{1, e) finite, any 6, 9' S [0, 1 - e] and x, x' £ 

\We'{-\x')-We{-\x) 

(with II • 1 1 TV denoting the total variation norm and 1 1 • 1 1 the Euclidean norm 
in R^). _ 

So, with the approximating chain of kernel Tye(Az|5;) having bounded 
increments (= Az), and its transition probabilities depending smoothly on 
{x,9), the scaled process n~^z{6n) concentrates around the solution of the 
ODE 

^{0) = F{m,0), (6.20) 

starting at y{0) of (6.17), where F{x,e) = (-1 + (/ - l)(pi - po), -(^ - l)pi) 
is the mean of under the transitions of (6.13). This is shown for instance 
in [23, 62, 72]. 

We note in passing that this approach of using a deterministic ODE as an 
asymptotic approximation for slowly varying random processes goes back 
at least to [59], and such degenerate (or zero-one) fluid-limits have been 
established for many other problems. For example, this was done in [55] for 
the largest possible matching and in [80] for the size of r-core of random 
graphs (c.f. [73] for a general approach for deriving such results without 
recourse to ODE approximations). 

Setting hp{u) = u — 1 + exp(— 7m'~^), with a bit of real analysis one 
verifies that for ^ = l/p finite, the ODE (6.20) admits a unique solution 
y{9; p) subject to the initial condition (6.17) such that yi{9; p) = lv}~^hp{u) 
for u{9) = (1 — 9Y^\ as long as hp{u{9)) > 0. Thus, if p exceeds the finite 
and positive critical density 

Pa = mf{p > : hp{u) > VnG(0, 1]}, 

then yi(9; p) is strictly positive for all 9 G [0,1), while for any p < pd the 
solution y{9; p) first hits the line ?/i = at some 6'=k(p) < 1. 

Returning to the XORSAT problem, [23, 72] prove that for a uniformly 
chosen linear system with n equations and m = pn variables the leaf re- 
moval algorithm is successful with high probability if p > and fails with 
high probability \i p < p^. See [32, Figure 1] for an illustration of this phe- 
nomenon. Similarly, in the context of decoding of a noisy message over the 
binary erasure channel (i.e. uniqueness of the solution for a given linear sys- 
tem over GF(2)), [62] show that with high probability this algorithm success- 
fully decimates the whole hyper-graph without ever running out of degree 
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one vertices if p > p^- Vice versa, for p < pd, the solution y{9; p) crosses 
the yi = plane near which point the algorithm stops with high probability 
and returns a core of size 0{n). The value of p translates into noise level 
in this communication application, so [62] in essence explicitly character- 
izes the critical noise value, for a variety of codes (i.e. random hyper-graph 
ensembles). Though this result has been successfully used for code design, 
it is often a poor approximation for the moderate code block-length (say, 
n = 10^ to 10^) that are relevant in practice. 

The first order phase transition in the size of the core at p = p^ where 
it abruptly changes from an empty core for p > pd to a core whose size is 
a positive fraction of n for p < Pd, has other important implications. For 
example, as shown in [23, 72] and explained before, the structure of the 
set of solutions of the linear system changes dramatically at pd, exhibiting 
a 'clustering effect' when p < pd- More precisely, a typical instance of our 
ensemble has a core that corresponds to n(l — 9^:{p)) + o{n) equations in 
ny2{0*{p))+o{n) variables. The approximately 2™~" solutions of the original 
linear system partition to about 2^^^p^ clusters according to their projection 
on the core, such that the distance between each pair of clusters is 0(n). This 
analysis also determines the location ps of the satisfiability phase transition. 
That is, as long as ^{p) = y2{G*{p)) — (1 — ^*(p)) is positive, with high 
probability the original system is solvable (i.e the problem is satisfiable), 
whereas when ^{p) < it is non-solvable with high probability. 

We conclude this subsection with a 'cavity type' direct prediction of the 
value of Pd without reference to a peeling algorithm (or any other stochastic 
dynamic). To this end, we set u to denote the probability that a typical 
c-node of Qi{n,m), say a, is part of the core. If this is the case, then an 
hyper-edge i incident to a is also part of the core iff all other I — 1 sockets 
of i are connected to c-nodes from the core. Using the Bethe ansatz we 
consider the latter to be the intersection of / — 1 independent events, each of 
probability u. So, with probability an hyper-edge i incident to a from 
the core, is also in the core. As already seen, a typical c-node in our graph 
ensemble has Poisson(7) hyper-edges incident to it, hence Poisson(7ii'~^) of 
them shall be from the core. Recall that a c-node belongs to the core iff at 
least one hyper-edge incident to it is in the core. By self-consistency, this 
yields the identity u = 1 — exp(— 7?i'~^), or alternatively, hp{u) = 0. As 
we have already seen, the existence of n G (0, 1] for which hp{u) = is 
equivalent to p < pd- 
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6.5. Diffusion approximation and scaling window 

As mentioned before, the ODE asymptotics as in [62] is of limited value 
for decoding with code block-length that are relevant in practice. For this 
reason, [11] go one step further and using a diffusion approximation, provide 
the probability of successful decoding in the double limit of large size n and 
noise level approaching the critical value (i.e. taking p„ — > p^). The resulting 
asymptotic characterization is of finite-size scaling type. 

Finite-size scaling has been the object of several investigations in statis- 
tical physics and in combinatorics. Most of these studies estimate the size 
of the corresponding scaling window. That is, fixing a small value of e > 0, 
they find the amount of change in some control parameter which moves the 
probability of a relevant event from e to 1 — e. A remarkably general result 
in this direction is the rigorous formulation of a 'Harris criterion' in [22, 86]. 
Under mild assumptions, this implies that the scaling window has to be at 
least r2(n~^/^) for a properly defined control parameter (for instance, the ra- 
tio p of the number of nodes to hyper-edges in our problem). A more precise 
result has recently been obtained for the satisfiable-unsatisfiable phase tran- 
sition for the random 2-SAT problem, yielding a window of size 0(n~'^/^) 
[18]. Note however that statistical physics arguments suggest that the phase 
transition we consider here is not from the same universality class as the 
satisfiable-unsatisfiable transition for random 2-SAT problem. 

If we fix p > 0, the fluctuations of z{n9) around ny{6) are accumulated 
in nO stochastic steps, hence are of order y/n. Further, applying the classi- 
cal Stroock-Varadhan martingale characterization technique, one finds that 
the rescaled variable {z{n9) — ny{6))/^/n converges in law as n ^ oo to a 
Gaussian random variable whose covariance matrix Q(0; p) = {Qab{d] p)', 1 < 
o-,b < 2} is the symmetric positive definite solution of the ODE: 



= G{m,o) + Aim, o)Q{e) + Q{9)A{m, oy (6.21) 

(c.f. [11]). Here A(x,^) = {Aab{x,e) = d^^Fa{x,9); 1 < a, 6 < 2} is 



the matrix of derivatives of the drift term for the mean ODE (6.20) and 
G{x,9) = {Gab{x,9) : a,b £ {1,2}} is the covariance of Az at {x,9) under 
the transition kernel (6.13). That is, the non-negative definite symmetric 
matrix with entries 



Gn{x,9) 
Gi2ix,9) 

G22{X,9) 



(/-l)[Po + Pi- (Po-Pi)^] 
-(/ - l)[popi + pi(l -pi)] , 
(/-l)pi(l-pi) 



(6.22) 
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The dependence of Q(^) = Q(^; p) on p is via the positive definite initial 
condition Q{0; p) of (6.18) for the ODE (6.21) as well as the terms y{e) = 
y{6; p) that appear in its right side. 

Focusing hereafter on the critical case p = p^, there exists then a unique 
critical time 0^ = 9^{pd) in (0, 1) with yi(^d) = 2/1 (^d) = and y'iiO^) > 0, 
while the smooth solution 9 yi{6; p^) is positive when 0^6^ and 0^1 
(for more on y{-; •) see [32, Proposition 4.2]). 

For pn = Pd + rrT^I'^ the leading contribution to Pi{n,pn) is the prob- 
ability ¥n,p„{zi{n9d) < 0) for the inhomogeneous Markov chain z{t) on 
with transition kernel VFT-/„(Az|n~^z) of (6.13) and the initial distribu- 
tion G2(-|ny(0); nQ(0)) at p = pn- To estimate this contribution, note that 
yi(^d;Pd) = 0, hence 



Thus, setting ai = / y/Qii-, both evaluated at 6 = 9^ and p = p^, by the 
preceding Gaussian approximation 



Pl{n,pn) = P„,pjzi(n0d) < 0) + o(l) = Gii-rai) + o(l) , (6.23) 



as shown in [11]. In particular, the phase transition scaling window around 



In a related work, [27] determine the asymptotic core size for a random 
hyper-graph from an ensemble which is the 'dual' of Qi{n, m). In their model 
the hyper-edges (i.e. v-nodes) are of random, Poisson distributed sizes, which 
allows for a particularly simple Markovian description of the peeling algo- 
rithm that constructs the core. Dealing with random hyper-graphs at the 
critical point, where the asymptotic core size exhibits a discontinuity, they 
describe the fluctuations around the deterministic limit via a certain linear 
SDE. In doing so, they heavily rely on the powerful theory of weak conver- 
gence, in particular in the context of convergence of Markov processes. For 
further results that are derived along this line of reasoning, see [26, 45, 46]. 

6.6. Finite size scaling correction to the critical value 

In contrast with the preceding and closer in level of precision to that for 
the scaling behavior in the emergence of the giant component in Erdos- 
Renyi random graphs (see [53] and references therein), for Qi{n,m) and 
Pn = Pd + rn'^l"^ inside the scaling window, it is conjectured in [11] and 
proved in [32] that the leading correction to the diffusion approximation for 



y\i9d\Pn) = rn 



^/^[^(^d;Pd) + o(l)]. 



/3 = Pd is of size 0(n 
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Pi{n, pn) is of order Comparing this finite size scaling expression 

with numerical simulations, as illustrated in [32, Figure 2], we see that it is 
very accurate even at n ~ 100. 

Such finite size scaling result is beyond the scope of weak convergence 
theory, and while its proof involve delicate coupling arguments, expanding 
and keeping track of the rate of decay of approximation errors (in terms of 
n), similar results are expected for other phase transitions within the same 
class, such as k-coie percolation on random graphs (with /c > 3), or the pure 
literal rule threshold in random /c-SAT (with k > 3, c.f. [41]). In a different 
direction, the same approach provides rates of convergence (in the sup-norm) 
as n grows, for distributions of many inhomogeneous Markov chains on M!^ 
whose transition kernels Wt^n{xt+i — xt = y\xt = x) are approximately (in 
n) linear in x, and "strongly-elliptic" of uniformly bounded support with 
respect to y. 

As a first step in proving the finite size scaling, the following refinement 
of the left hand side of (6.23) is provided in [32, Section 5]. 

Proposition 6.6. Letw G (3/4,1), J„ = [n9d — n'^,n9(i + n'^] and \p — pd\ < 

-1 yj^^fi ^ 2w — 1. Then, for e„ = Alogn and (5„ = D n^^/^(log n)^, 

P„,p| inf zi{t) < -En] - Sn< Pi{n, p) 

< FnJ inf zi{t) < En} + 5n ■ (6.24) 

At the critical point (i.e. for p = Pd and 9 = 9^) the solution of the ODE 
(6.20) is tangent to the yi = plane and fluctuations in the yi direction 
determine whether a non-empty (hence, large), core exists or not. Further, in 
a neighborhood of 9^ we have yi{9) ~ ^F{9 — 9d)'^, for the positive constant 

F = ^(^d;Pd) = ^(y(^d;Pd),^d) = ^ + ^J^ (6-25) 

(omitting hereafter arguments that refer to the critical point). In the same 
neighborhood, the contribution of fluctuations to Zi{n9) — zi{n9d) is approxi- 



mately Y Gn\9 — ^dl, with G = Gii{y{9d', Pd), ^d) > 0. Comparing these two 
contributions we see that the relevant scaling is Xn{t) = n~^^^[zi{n9d + 
n^/^t) — zi(n0d)]) which as shown in [32, Section 6] converges for large n, by 
strong approximation, to X{t) = \Ft^ + \fGW{t)^ for a standard two-sided 
Brownian motion W{t) (with 1^(0) = 0). That is. 
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Proposition 6.7. Let ^(r) he a normal random variable of mean (^7) f 
and variance Qn (both evaluated at 9 = 9^ and p = p^), which is indepen- 
dent ofW{t). 

For some w G (3/4,1), any r] < 5/26, all A > 0, r & R and n large 
enough, if Pn = Pd + rn~^^'^ and e„ = Alogn, then 

Fn,p„{ inf zi(t) < ±£n} - P{n^/^e + inf X(t) < 0} < n"" . (6.26) 

We note in passing that within the scope of weak convergence Aldous pi- 
oneered in [8] the use of Brownian motion with quadratic drift (ala X{t) of 
Proposition 6.7), to examine the near-critical behavior of the giant compo- 
nent in Erdos-Renyi random graphs, and his method was extended in [46] to 
the giant set of identifiable vertices in Poisson random hyper-graph models. 

Combining Propositions 6.6 and 6.7 we estimate Pi{n,pn) in terms of the 
distribution of the global minimum of the process {X{t)}. The latter has 
been determined already in [50], yielding the following conclusion. 

Theorem 6.8. For I > 3 set ai = ^/VQH, h = -^&/^ F"""/'^ and 
Pn = Pd~\~ rn"^^"^ . Then, for any rj < 5/26 

Pi{n, Pn) = Gi{-rai) + G[{-rai) 71^^^ + 0(n"^) , (6.27) 

for K = J^l — /C(z)^] dz and an explicit function /C(-) (see [32, equation 
(2.17)]). 

Proof outline. Putting together Propositions 6.6 and 6.7, we get that 

Pi{n,pn) = F\n^'^i + lniX{t) < o| + 0(n~'') • 

By Brownian scaling, X{t) = F-^/^&/^X{F'^/^G-^/^t), where X{t) = ^t^ + 
W{t) and W{t) is also a two sided standard Brownian motion. With Z = 
mitX{t), and Y a standard normal random variable which is independent 
of X(t), we clearly have that 

Plin, Pn) = F |ni/6 (^^^ r + n^/^VOTiY + F-V3g2/3^ < gj ^ q^^^-v^ 

= e{Gi( - rai - hn-^/^Z)'^ + ©(n"^) . (6.28) 

From [50, Theorem 3.1] we deduce that Z has the continuous distribution 
function Fz{z) = 1 — IC{—z)'^I{z < 0), resulting after integration by parts 
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with the exphcit formula for k = —KZ (and where [50, (5.2)] provides the 
exphcit expression of [32, formula (2.17)] for /C(x)). Further, as shown in 
[32, proof of Theorem 2.3] all moments of Z are finite and the proof is thus 
completed by a first order Taylor expansion of Gi( • ) in (6.28) around —rai. 
□ 

Remark 6.9. The simulations in [32, Figure 2] suggest that the approxima- 
tion of Pi{n, pn) we provide in (6.27) is more accurate than the 0{n~^^^^~^^) 
correction term suggests. Our proof shows that one cannot hope for a better 
error estimate than 0(n~^/^) as we neglect the second order term in expand- 
ing $(— ra/ + Cn~^/^), see (6.28). We believe this is indeed the order of the 
next term in the expansion (6.27). Determining its form is an open problem. 

Remark 6.10. Consider the (time) evolution of the core for the hyper- 
graph process where one hyper- edge is added uniformly at random at each 
time step. That is, n increases with time, while the number of vertices m is 
kept fixed. Let S{n) be the corresponding (random) number of hyper- edges in 
the core of the hyper-graph at time n and = min{n : S{n) > 1} the onset 
of a non-empty core. Recall that small cores are absent for a typical large 
random hyper-graph, whereas fixing p < the probability of an empty core, 
i.e. S{m/ p) = 0, decays in m. Thus, for large m most trajectories {S'(n)} 
abruptly jump from having no core for n < n^^ to a linear in m core size at 
the random critical edge number n^i. By the monotonicity of S{n) we further 
see that Pm{7T-d ^ m/p} = Pi{p,m/p), hence Theorem 6.8 determines the 
asymptotic distribution ofn^. Indeed, as detailed in [32, Remark 2.5], upon 
expressing n in terms of m in equation (6.27) we find that the distributions 
of = ai{pdnd — rn) j ^Jrafp^ ^ biKp^l^mT^I^ converge point-wise to the 
standard normal law at a rate which is faster than m~^^'^^^^. 

Remark 6.11. The same techniques are applicable for other properties of 
the core in the 'scaling regime' pn = pd + rn~^^^. For example, as shown in 
[32, Remark 2.6], for m = npn and conditional to the existence of a non- 
empty core, {S{n) — n{l — Od))/n^^^ converges in distribution as n —> oo 
to (4Qii/F2)V4^^ yjhere Z.^ is a non- degenerate random variable (whose 
density is explicitly provided there). In particular, the 0(?i^/^) fluctuations 
of the core size at fixed p < pd are enhanced to 0{n^^^) fluctuations near 
the critical point. 
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